The (Failing) Semantic Web

The web is a mess. Since the beginnings of the explosion in use of the World Wide Web, browsers have been too forgiving on the validity of markup. This, combined with the open nature of the web, allowing anyone to publish anything, lead to the vast quantity of poorly written web pages making up the web today.

Semantic Markup

There are numerous advantages to using accessible, semantic markup when creating web documents. As the term suggests, this technique allows content to be more accessible to a wider variety of people, through the increasing use of assistive technologies such as screen readers. When a document is structured semantically, the task of representing this to different audiences becomes much easier.

While semantic markup and the semantic web are separate concepts, the adoption of these technologies by the wider community can be seen as comparable. Both require additional effort on the part of the page author, and advantages can be seen in both cases.

Despite the accessibility benefits of using semantic markup, and this now being a legal requirement for any organisation offering a service to the public, a recent BBC report found that the majority of leading websites still fail to meet basic accessibility requirements.

So why are so few sites accessible? For many, other than accessibility advocates, there is no reason to put in the extra effort involved. Unlike with conventional programming languages, web browsers will consistently display syntactically invalid markup, giving them no reason to learn the ‘proper’ way. In addition to this, WYSIWYG editors often output invalid or inaccessible code, as it is difficult to add semantics to the page without a knowledge of context.

The W3C’s Loss of Control

There is also a perceived lack of faith in the W3C, who originally set out to establish standards for web markup. Official specifications have become stagnant in recent years, with the last major revision being XHTML 1.0, published in 2000. The draft XHTML 1.1 specification was released shortly afterwards, but 6 years later a final version has still not been published. In contrast, unofficial developments and proprietary technologies have experienced prolific growth and adoption.

Javascript (officially ECMAScript), RSS (and also the Podcasting extension), Flash, Pingbacks, and more recently microformats are all unofficial/proprietary developments which have experienced near-universal adoption. Flash has arguably become the new standard for embedded media in web pages, due to its cross platform compatibility and non-reliance on video codecs. In the time since the last revision to the official XHTML specification, Flash has received 5 major updates. While this technology is inherently inaccessible, it represents the speed of web developments outside the W3C.

RSS was originally developed by Netscape (as with Javascript), but now has 3 branches in development: RSS 1.0, RSS 2.0 and ATOM. Despite its name, RSS 2.0 is a separate development branch which ignores the official RDF specification (unlike RSS 1.0), and instead uses simple XML without the concept of DTDs. This, like Flash, has become a universal mechanism without the support of the W3C.

I’ll post the rest of the article here in a few weeks when I won’t get done for plagiarism or something..


Reader comments

  1. I think you hit the nail on the head when you say that there’s no perceived benefit for 99% of the population to add syntactically correct markup. Sure, that one percent of sites out there who can count a blind user as one of their patrons might see a benefit to ensuring that their product can be sold to him/her too, but when at least half of all sites (note I say sites, and not pages) are Joe Smith in his parent’s basement raging against the machine about why he’s grounded, there’s never going to be a mass exodus to proper markup.

    The other part of the problem is, as you said, the WYSIWYG editors that already add scores of useless markup as it is. I don’t think it’s due to the lack of context, but rather the fact that there really isn’t a single “standard”. Sure there’s HTML 4.01/XHTML 1.1, but they are notoriously out of date, and as such, the separate browser developers like Microsoft, Mozilla, and Opera have taken it upon themselves to build upon them.

    The whole scenario is really a whole catch-22: how do the WYSIWYG editors add features if each browser has its own set of “quirks”, how do the browser developers get rid of the quirks if there’s no set standard, and how does a standard get implemented if no one who uses the WYSIWYG editors care anyways?

    I guess it’s anyone’s guess how we get ourselves out of this mess.

    Posted by Justin Smith on July 10, 2007

Leave a comment

Your email address will not be shown on the page. HTML is disabled, but URLs will be converted to links.