Although I started out to write a report on Demand Media this report ended up being more about the online content problem, Google, Quora and the prospects for new search engines. The full report (PDF) is available here: Demand Media Crapification.
The report is a bit long for a single email or blog post so over the next few days I’ll post the three sections of content in order. The first section deals with what’s behind an alarming decline in the general content returned from the Internet.
After 15,000 hours of hands-on research it’s clear that most of the easily accessed Internet content, particularly what is returned from Google search results, is crap. Although generally acknowledged by people in the hard-core knowledge worker segment, this viewpoint is now becoming main-stream.
So how did things get this bad? While advertising may be the “root of all evil,” how deep it goes depends on the conditions. The number and size of the forces at work here are prodigious. Here are some of the more technical and content-related forces:
“Search Engine Optimization,” or SEO, takes advantage of the algorithms Google uses to game the system. By relying on computers to rank search results, we invite the endless contest between those that find ways to outsmart the algorithms and the search engines trying to return valuable information. Google sometimes make changes to their algorithms in an attempt to foil current strategies but at the same time opens up other loopholes for exploiters to use. Another negative side effect is that these changes sometimes upset search rankings that were previously good in what amounts to a big reshuffling. Google has much better algorithms for returning search results but we are unlikely to see them implemented—more on that later.
Dumb Clicks: Generally, the more clicks some-thing gets the higher it ranks. It doesn’t matter if the clicks were of the “fooled you, there’s no real content here” variety. (See the attached example of a common top-ranked page.) Because people tend to click on something near the top of the results this crap keeps getting clicks. Even a senior search engineer at Google observing user behavior said: “How can you not see that this is a spam page and click on it?!” Time only makes this pattern worse as this type of content “crowds out” everything else.
Tricksters & Cheap Shots: One example of this category is alternativeto.net. In this case, someone must have noticed a popular search technique of using “alternative to” as in “alternative to Photoshop.” Suddenly most queries put in this way returned crappy results full of ads rather than useful information. There is a legion of mostly small-time operators that use domain misspellings and other simple ploys to capture traffic and a few clicks on “parked” domains. This class of problem is fairly easy to solve but still pops up from time to time again like a disease you can’t quite eradicate.
Content Farms: This is where players like Associated Content (acquired by Yahoo), About.com and Demand Media take the game to a new level. Content farms are harder to counter because they invest some money in “real” content to insert themselves into search results. We’ll save the analysis for the following section on Demand Media. Content farms may seem beneficial compared to the SEO villains and tricksters but that’s what makes them insidious.
Syndication: Many sites are so desperate for con-tent they are willing to syndicate just about any-thing. Because they are often following some of the same SEO strategies and now provide even more links to the original content source, the ranking and ubiquity of a lousy piece of content solidifies like a plaque to block normal information flow.
Filter Failure: “Leakage” is when something that is supposed to protect us from garbage content begins to break down. Even paid-for services like CapitalIQ rely on automated filtering and eventually get penetrated by spam content that corrupts the feed. This is just another factor suggesting that effective filters in the future will require some level of human validation even if it also is automated. Some of those methods are described in the final section. (See this related video from Clay Shirky.)
Two other more anthropological factors play a major role in the drive to lower quality content:
Recency bias: Good content has long been pushed out of focus by inferior versions be-cause they are newer. This is true in long-standing categories like movies and books. While freshness has a value it ends up being counterproductive in many cases and results in a “reinventing of the wheel” in the case of factual, well reasoned and documented content. Good content is often not “sticky,” fades away quickly, and becomes hard if not impossible to find. Wikipedia is one example where this is not the case which is why so many people search it explicitly.
All clicks are not created equally: There’s a strong inverse relationship between intelligence / expertise / judgment / insight and the tendency to click. On Google “a click is a click,” so content gets driven to the lowest common denominator. Steve Jobs probably makes fewer, more informed clicks and online decisions than Paris Hilton—but Paris and her ilk are what drive content rankings.
Solving some of these problems is actually easier than it would first appear, because harnessing the power of human behavior and adding more available information to the analysis can lead to excellent results. Next we’ll look specifically at Demand Media, the gorilla of the content farms.
- Why Google and Demand Media Are Headed For a Showdown (gigaom.com)
- Blekko Bans Content Farms Like Demand Media’s eHow From Its Search Results (techcrunch.com)
- Search Still Sucks (techcrunch.com)