Defective Orthographies and the Web

June 2, 2009

A defective orthography is a writing system that doesn’t have a one-to-one correspondence between the letters and the phonemes in the language, such as the use of the letters ‘t’ and ‘h’ to make the th sound in English. Many languages exhibit this property.

The languages of the web, CSS and HTML in particular, suffer from the same defective gene; that we’ve had to ‘hack’ their limited base vocabulary to create sounds that weren’t planned-for or built-in.

HTML and CSS have it easier than English, of course. There are roughly half-a-billion people (‘systems’) who understand English, with an average generation lasting about 60 years. The web has possibly 6 or 7 major ‘systems’ (browser rending engines, different major versions) that understand the language, with a generational lifetime of a few years. In theory, the orthography of the web has a much better environment to adapt and change.

Why, then, does the language still so closely resemble that of ten years ago? Has it not had the opportunity for four or five major revisions?

One could argue that the commercial environment (in which web browsers exist, even if they are free) does not exhibit true evolutionary tendencies, for web technologies or otherwise. The natural ‘organic’ progress of technology can be purposefully retarded by influencing bodies in the environment (the large, majority-share corporations), if maintaining the status-quo supports selfish goals – the selfish gene, as it were.

Or perhaps the languages are already evolved enough; fast, lasting mutations are not fit for the environment yet – perhaps the environment is still catching up with the languages themselves; they haven’t yet reached their full current potential. This is where technology differs from nature: we can plan for future needs, not just adapt to current ones.

Whatever the case, it will be interesting to see how web languages evolve over the coming years, as majority-share (and therefore control) slips from the few, while simultaneously the number of systems speaking the language diversifies (to mobile and other devices). Will HTML and CSS further split into specific branches (for example, XHTML-MP for mobile), or will a smaller core feature-set be depended on by an increasingly diverse range of systems (could the used part of the language shrink, and esoteric words disappear from the vocabulary, as happens in spoken languages)?

Leave a Reply