Written by Long Beach Web Designers Saturday, 04 September 2010 08:59
Posted by Dana Lookadoo
This post was originally in YOUmoz, and was promoted to the main blog because it provides great value and interest to our community. The author's views are entirely his or her own and may not reflect the views of SEOmoz, Inc.
How do I recap the SEOmoz PRO Seminar session on Uncovering a Hidden Technique for SEO? The title is so attractive that it produces Pavlonian symptoms as we salivate at the thought of uncovering a hidden SEO treasure. Ben Hendrickson of SEOmoz presented a model which appears to show how Google may assigning relevance to keyword terms based on context - topical relevance.
Is Latent Dirichlet Allocation (LDA) that hidden jackpot?
1st - LDA is not new nor something SEOmoz invented. The Information Retrieval model has been around for 7 or 8 years, and IR geeks have talked about it before. There are a number of resources, as well as nay saying, about LDA and Google's possible use of it.
2nd - What is new is SEOmoz's LDA Topics Tool that produces a relevancy score based off a query (search term). It enables one to play with words that may increase a page's relevancy in the eyes of Google. It shows words that help Google determine how relevant the page is to a user's search query.
Game Changer?
Kyle Stone tweeted that the LDA tool is a game changer, and many retweeted.
Is SEOmoz's LDA tool a game changer? That's yet to be seen. The goal is to report Ben's research as presented at the Mozinar and how a layman (myself) interprets such. Rand is going to do a follow-up post to explain more.
Why all the hype?
The SEO Challenge
SEOs face the continual challenge of figuring out Google's hidden ranking algorithms. How do we rank higher? Which signals are the most important? We know search engines are "learning models" that attempt to understand "context” of words. Google has said for years that webmasters should concentrate most on providing good relevant (contextual) content.
There are ways to rank higher. Is it as easy as 1, 2, 3?
- Create quality copy with keyword(s) on the page along with associated anchor text links.
- Get good links.
- What Ben talked about in this session.
LDA - Topic Modeling & Analysis
Latent Dirichlet Allocation, in layman's terms, translates to "topic modeling." In search geek terms, LDA is the following formula:
(Did you digest that? Don't worry; Mozzers groaned and laughed at the same time. PLUS: Scientist Hendrickson delivered this session after lunch!)
LDA Simplified - Here is Ben's way of explaining topic modeling:
(Okay, I was once proud that I got an A in Logic and Combinatorics - discrete math/set theory. However, that computer science class now feels like basic math compared to this formula.)
It made more sense when Rand Fishkin joined Ben on stage and when Todd Freisen moderated and deciphered during Q&A. (Manuela Sanches of Brazil was sitting next to me and said that Ben's "presentation needed subtitles!")
The objective of LDA, from my deciphering of Greek, is to understand how Google is using semantic contextual analysis combined with other signals, to define topics/concepts. It's how Google analyzes the words on a page to determine the "set" to which a word belongs - how relevant a search query is to pages in its database.
For example: How does Google assign relevance to the word "orange" on a page? They determine orange is related to the fruit set or to the color set by page context.
LDA Defined:
"Latent Dirichlet Allocation (Blei et al, 2003) is a powerful learning algorithm for automatically and jointly clustering words into "topics" and documents into mixtures of topics. It has been successfully applied to model change in scientific fields over time (Griffiths and Steyver, 2004; Hall, et al. 2008).
A topic model is, roughly, a hierarchical Bayesian model that associates with each document a probability distribution over "topics", which are in turn distributions over words."
Bayesian - ah, a term I recognize!! Bayesian spam filtering is a method used to detect spam. It draws off a database and learns the meaning of words. It's "trained" by us when we mark an email as spam. It looks at incoming emails and calculates the probability that the content of an email is contextually spammy.
I found a PowerPoint presentation about Bayesian Inference Techniques by Microsoft Research from 2004 that presents the possibility of using LDA. Go to slide 54 and read:
"Can we build a general-purpose inference engine which automates these procedures?"
Microsoft has been looking at LDA models. Do search engines use it as one of their primary methods?
Ben sampled over 8 million documents with approx. 1,000 queries. He believes Google is using LDA topic modeling to determine (learn) what words mean by their associations with, relevance to, other words on the page. (Other factors are included.) Ben called the results a "co-occurrence explanation" that use a "cosine similarity."
SEO Takeaway:
- Results that are higher in Google SERPs, in general, have more topical content.
- Search engines do APPEAR to apply semantic analysisÂ… when indexing a page and determining the intent of the words on the page.
Rand tweeted an explanation (in 140 x 4) as follows:
Dana's LDA Catwalk Metaphor for Topic Modeling:
Imagine the words on your page as walking down the fashion runway in Paris. Your keyword phrase is "dressed" in semantic accessories, words that correlate to and dress up your topic. Associated words bring meaning to and highlight the fashion model's outfit. Adjectives, modifiers and synonyms are like jewelry, hats, and shoes. The combination can transform your base layers (your target terms) from casual or conservative business attire into a sexy night-on-the-town ensemble.
Combinations and permutations of words on a page "dress" your skinny or curvy fashion model. Relevant words provide Google with an image of what she is wearing and the catwalk upon which she struts. LDA refers back to what Google already knows about these "accessories" (words) and their previous association with the topic terms related to fashion.
Enter Topical Ambiguity - I just broke the "rules" for context with the catwalk metaphor by referring to modeling in two contexts on this page:
- I used "modeling" terms that relate to the "fashion industry" set.
- The catwalk metaphor is irrelevant content that is off-topic for discussing "LDA topic modeling."
Google Algorithm Exposed?
Ben clearly said that LDA is an ATTEMPT to explain the SERPs. His scenario, a quote from his presentation slides, follows:
One of us needs to implement it so we can:
1) See how it applies to pages
2) See if it helps explain SERPs
One-two-three-not-it.
LDA is not LSI.
There were some tweets claiming SEOmoz was bringing back LSI or snakeoil. Ben clarified that LDA is not LSI, which deals more with keyword density. He explained that he is NOT talking about loading keywords on a page but about the relevance of the topics within the page. He said that:
"LSI doesn’t have the same bias toward simple explanations. LSI breaks down as you try to scale up the number of topics."
The LDA tool deals with context, semantic relevancy, not density - in addition to some other random factors. Example:
If SEOmoz has a page all about "SEO" and "tools," and there is another word on the page that can be explained by a word that is more related to SEO topic, then the related word would be used. Meaning, "seo tools" doesn't have to be repeated over and over, and the related word would be interpreted by Google as being relevant.
Ben, who appears to have the brain of a search engine, noted that it "appears" LDA is what Google is heading for in the near future. He said (paraphrased):
If they are not doing it, they seem to be doing something that has the same output. They are probably already using it.
Rand deciphered:
It’s a super weird coincidence if Google is not using it.
Are On-Page Signals Stronger than Links?
Are we heading toward more emphasis of on-page topic modeling? I'm not an IR geek, but I do plan to spend more energy focusing on understanding how search engines retrieve informaton. We are dealing with a semantic Web. LDA may indicate that good old on-page optimization sends stronger signals than links.
SEOmoz's LDA tool attempts to show how relevant content is to a chosen keyword. It computes relevance of queries.
The following shows how relevant SEOmoz's Tools page is to Aaron Wall's SEO Book Tools page.
The score at the top is an indicator of how relevant the content on that page is according to LDA.
- Aaron's content is 72%* relevant for the query "seo tools."
- SEOmoz's tools page is 40%* relevant.
*NOTE: (I inserted the logos.) You can run the same pages and get different results. The results are similar in that SEO Book always scored as more topically relevant, but the percentage varies. Is this the random Monte Carlo algorithm at work? Ben?
Mozinar Question:
"How do we execute this for SEO?"
Ben's Answer:
"I don't actually do SEO. I write code."
That's up to us, the SEOs, to play and test in our Google playground.
Use the tool to decide if you can win with LDA to optimize your on-page signals.
- Use the LDA Topics Tool to return words that could be used on a page for a query.
- Then determine who is ranking for that term.
- Simply write content that is highly on-topic based off the findings you observe.
If you are not performing that well in the SERPs, think about classic on-page optimization. In the example above, rather than putting another instance of "seo tools" on the page, LDA shows there are better ways to tell Google that you are about that topic. The tool provides a way to measure that.
IMPORTANT: There is a threshold at which too many related words will appear as too spammy. LDA is not something to be used to game Google.
Test the LDA Tool out for yourself, and draw your own conclusions.
***
DISCLAIMER: I'm not claiming this methodology has uncovered hidden SEO treasures. Time, testing and playing around with a new SEOmoz tool while observing the SERPs will reveal the answer. In the meantime, I'm going to dress up my pages and accessorize them with relevant terms that make them dazzle so they look good climbing the Google catwalk.
Do you like this post? Yes No
Written by Long Beach Web Designers Friday, 03 September 2010 18:21
No, not like that, but in the good way! :D
The following is a guest post by Jim Kukral highlighting one of the most fundamental tips to succeeding online.
Have you ever really taken a step back from all the technical SEO stuff and thought about why Google wins? The real reasons why they have mass-market share and why they continue to dominate? It's time you should, because once you understand how to start thinking like Google, you can finally begin to go beyond just ranking better, but also how to be a master Internet marketer so you can get more sales, leads and publicity.
After all, once you've been found, you now have to convert. Otherwise, it's a waste of time.
So why does Google win? Because Google is the world's biggest, and best, problem solver. The truth is that there are only two reasons why we all go online, using Google or not. Those two reasons are:
1. To have a problem solved
2. To be entertained
That's it. Everything, and I mean everything you do online falls under one of those categories. For example, let's say you're planning on cooking your wife her favorite chicken marsala dish for your anniversary. You go online and do a search for "chicken marsala recipes". Boom, you now have recipes, and videos, and images and cookbooks and all kinds of information to help you solve your problem.
As another example, let's say you wanted to relax after work and watch your favorite musician play some of your favorite songs. You go to YouTube and do a search for "Rolling Stones Videos" and boom, you're now watching video content that entertains you.
YouTube, which is owned by Google, is already the number two most searched search engine on the Internet (behind Google of course). That means that today billions of people are actively searching the Internet for video content. That also means that because of the public's fast-growing massive hunger for content in video form, that regular people and businesses alike are now able to profit from the creation of that said video content.
The truth is, Google (and your business) has to solve problems for their (your) customers, the Internet searcher. If they (you) can't do that, they (you) lose customers. It's that black and white.
So I'll ask you again. Are you thinking like Google? Have you sat down and figured out what your target audience's biggest problems are? If you haven't done that you need to do it now. Anticipate what they need. Figure out their pain and then create products/services that take that pain away.
Just like Google.
For over 15-years, Jim Kukral has helped small businesses and large companies like Fedex, Sherwin Williams, Ernst & Young and Progressive Auto Insurance understand how find success on the Web. Jim is the author of the book, "Attention! This Book Will Make You Money", as well as a professional speaker, blogger and Web business consultant. Find out more by visiting www.JimKukral.com. You can also follow Jim on Twitter @JimKukral.
Written by Long Beach Web Designers Thursday, 02 September 2010 17:10
Posted by Aaron Wheeler
In this week's Whiteboard Friday Rand Fishkin clues you in on four link building tactics that you likely haven't heard about. Given the importance of link building to SEO, this video should prove to be worth its (virtual) weight in gold. (I mean that in the best possible way ;-p)

