Signals of Quality

Last year at SxSW Interactive, I watched a panel called Web Standards and Search Engines: Searching for Common Ground which discussed the use of semantic markup and it’s relationship to search engines. Of course a web standards compliant website will help you to a certain degree with search engines, this does not need to be proven, but how much does it actually help?

Look at the Amazons and Ebays of the online world and you will find high-ranking, non-compliant websites that show little or no signs of future developments to make them accessible, compliant products.

One issue that was discussed briefly during this panel and has been a question I have wanted to ask Google for some time now is, “What if Google actively rewarded you for having an accessible or validating site?”.

Would this be in their best interest? Would it have a negative impact on the search results?

Tim Mayer, Dir Product Manager of Yahoo! Search discussed this issue by mentioning a basic criteria he dubbed, “Signals of Quality”. The search engine algorithm basically assesses the HTML document and looks for anything that would give the search engine reasonable clues as to what the page is about. Tim comments:

There are many signals of quality that a search engine can use to evalute what a page is all about, these will keep evolving as some of the old ’signals’ get abused (eg. meta-data spamming) and new signals will be introduced as internet usage evolves.

Now I know that there are many, many factors in a page that could be considered a “Signal of Quality” and they are continually evolving in this new era of social media and constant evolution of the web, but could code validation be considered amongst them? On this Tim answers:

If at some point being able to validate HTML becomes a huge signal of quality that may be something we would take into account. At this point it isn’t, so few websites validate that it isn’t a signal of quality that we believe would add to the relevance of our search engines.

Nobody validates, so we don’t use it as a measure of quality. This sounds like some serious complacency to me.

Surely search engines must be concerned with bandwidth and semantics? An entire web that validated must speed up the process of indexing and applying semantic value to the enormous quantity of websites they must index day to day.

We do know that Google is interested in accessibility to some degree now… but how is this current implementation helping? Is this considered a reward if you place number one in an experimental version of Google’s search engine because you have an accessible website?

Would it have a negative impact to the search engine results is what I would like to know? If Google (let’s be real here, only Google has this power at the moment, sorry Yahoo and MSN) were to suggest in even the slightest that they might reward accessibility and code validation in their search engine results then there would be a serious movement, cleaning up large websites across the web. Suddenly validation and accessibility would become a “Signal of Quality”.

Answers to these questions aren’t easy to find but soon we may have our chance to ask Google.

Comments

Ben Buchanan says: February 5, 2007 @ 12:39 pm

so few websites validate that it isn’t a signal of quality

…incredibly bad logic there. If a site validates at this point in time, it indicates that someone has paid serious attention to the quality of the page. Surely a signal of quality! Maybe they don’t want to open that door since they’d then be admitting that their own pages suck.

Their interest in accessibility is minimal at best. Accessible search is treated as a bit of a curiosity, as far as I can tell. A neat toy produced by someone’s 20%, but that’s about it.

The thing I’ve come to realise about Google is that they do not consider inaction be “doing evil”. Despite the tremendous influence they have, they don’t use it to “do good”. Personally I think their inaction is a form of doing evil, but that’s just me.

Standardzilla says: February 5, 2007 @ 1:24 pm

@Ben, I totally agree with that sentiment. No sites will cleanup their act when there is no push from Google.

Max Design - standards based web design, development and training » Some links for light reading (6/2/07) says: February 6, 2007 @ 12:54 am

[...] Signals of Quality [...]

Pixel Invasion says: February 6, 2007 @ 10:36 am

The problem with search engines using valid markup as a signal of quality is that it has nothing to do with the quality of the content of the site, just the way it is structured. Do you want a search engine to bring you to a site becuase the content is semantically structured and the page validates, or because the content is superior? It’s a fine line, and I can’t imagine any search engine adopting this as a significant signal of quality any time soon.

While the push would make all the work we’re doing worthwhile, it is not in the search engine’s best interest, nor in the public’s best interest… not at this stage anyway.

Rene Saarsoo says: February 6, 2007 @ 10:44 am

I don’t really see why Google should invest into validating billions and trillions of web pages. Validating all the pages in web takes a considerable amount of time and resources, and the return on this investment is questionable.

First of all there are a lot of pages in web, that are perfectly valid, but contain absolutely no useful content - like some error pages generated by web servers. Actually, the smaller the page, the grater the probablity of it being valid - often page authors who don’t strive for valid HTML just happen to produce one, when the page is small enough.

You can also produce a perfectly valid page by using Mozilla Composer or some other simple WYSIWYG editor, that happens to produce valid HTML. But web sites developed with that kind of tools don’t usually contain quality content. (On the contrary, if someone is using WYSIWYG editor that produces invalid HTML, should he get punished?)

Also, one can use a CMS, that produces valid code - again, no connection with better content.

Of course, there are a lot of valid pages with quality content, but can you prove, that valid pages have better content than invalid ones?

Without hard data to back up your claim, it’s just a wild guess, because when you say, that based on your experiences, valid pages have better content, then I can say, that based on my experiences, sun goes around the Earth.

standardzilla says: February 6, 2007 @ 11:49 am

@pixelinvasion - your argument goes the same for meta-data then, where meta-data doesn’t count for so much anymore, it still counts for *something*. Validation says nothing about the content, but why can it not be a factor? (a small one at that?)

@Rene - Google doesn’t have to invest anything. Google mentions validation is a *factor* in assessing a page… it would be vague and not provide any detail into how much (as with most of the web developer guidelines for Google) and this only would have an impact on the products coming out in the future.

Content would surely beat validation, I am not saying validation would be the new Page Rank, that would not be realistic.

The point here is *does validation have negative effects?*

Hobo SEO says: May 26, 2007 @ 10:12 am

I did a recent test on my blog where I tested Google with regards to accessible pages. First results were positive IE out of four pages it piked the valid page to rank, but now it seems to have picked an invalid page - I don’t think Google gives a damn about small errors or accessibility barriers.

Standardzilla says: May 26, 2007 @ 1:04 pm

@Hobo - you are correct… I just think it’s good form to make sure a spider crawls your website as efficiently as possible, but Google will not penalise your site or anything like that.

Turk Hit Box says: May 26, 2007 @ 6:47 pm

I always validate my pages no matter what. I just make sure that I don’t lose any of the slightest quality Google might reward my page with.

I don’t know if its Black Hat or White Hat, you can use a simple code to show spiders “printable” versions of your pages which validate. Printable versions usually carry less code and more content.