HTML5 Sucks

Updated 21 Dec 2023

Since I wrote the first cut of this critique a decade or so ago, the languages of the web have evolved steadily. Some of the horrors I whinged about have been fixed, others dug deeper in. My own views have changed a little too, mainly with more bright ideas to make it author-friendly, such as re-purposing CSS rather more cleanly. But HTML5 still sucks.

Contents

Heresy

It's true. There are a few nice things about HTML5 but for the most part it has really gone the wrong way. Wow, the invective I get when I criticise it. It feels like I am a heretic, like we have a religion on our hands. "It's best practice," the solemn words are intoned as yet another O'Reilly classic of the past whisks past my head.

Well, let's not forget the WHATWG's current thoughts on HTML5 as a "living standard" that will never need superseding, and more, at https://html.spec.whatwg.org/.

"It must be admitted that many aspects of HTML appear at first glance to be nonsensical and inconsistent ... Features have thus arisen from many sources, and have not always been designed in especially consistent ways."

Right. One can make the same criticism of CSS. "Best practice", you said. Really? Best for whom?

And while I am on the state of the "standard", we may note that the official FAQ on GitHub happlily blathers, "The whole standard is more or less stable." FFS! It is both "living" and "stable"? Really? The FAQ goes on to explain that there are no longer any stable snapshots because "The problem with following a snapshot is that you end up following something that is known to be wrong." So there you have it, a "more or less stable" specification that is constantly leaving its previous state behind. Naturally, the change history of this "stable" miracle no longer needs documenting; the only change notes that maintainers of legacy code ever need care about now are the logs buried in the GitHub archives. Not only is it constantly breaking its previous state, but it is doing so with neither warning nor traceability. This is known as "more or less stable". Got that clear? Back to the main rant, then.

Recall that HTML was originally supposed to be the human-readable lingua franca of the web, the language that anybody can throw together and get something up there. Yet as it evolved HTML saw itself more and more as a programming language, to be generated by machines for other machines to parse. Human needs were supposedly catered for – plain HTML is still human-readable for example. But the truth is, in functional terms its ethos has become subservient to the mantra.

Ah yes, the mantra. The author provides the content, HTML the structure or "semantics", CSS the style. Then there is JavaScript for the smart, programmatic interactivity. Make no mistake, the parallel use of these several languages is baked into the HTML5 specification.

HTML 1 came before the semantics-vs-style revolution and simply tried to do something practical for the author. It followed the natural human intimacy between semantics and style, reflecting the way we design and read documents. That is to say, HTML 1 markup was as much about layout as semantics. Moreover, the semantics were more about the meaning indicated to the reader by the layout than about the meaning understood by any software system.

Then came the XML revolution and by the time we reached HTML/XHTML 4 it had become an organised religion. The emphasis wholly changed, seeking to do something practical for the system developer by utterly divorcing presentation or style from semantics and then twisting the nature of the semantics to be machine-processable. The whole thing had been turned back to front: it was no longer the congregation who mattered, it was the priests and their liturgies. But the problem was, as every page author, designer and humble sinner knows, styling and semantics are indeed indissolubly linked: if you try to separate them you are bound to run into trouble.

With HTML 4 we had moved away from the original user-centric language and instead pretended to a rigid bottom-up distinction between semantics, content and style. Pretended to? That's right – it never worked and never can. Web pages and the people who write them are top-down things. By that I mean that we start with the broad container and work down to the details. Bottom-up is fine in a rule book for a programming language such as javascript, but not for a language designed to communicate stuff to us mere mortals, we simply do not work that way. XHTML tried determinedly to bridge the divide but failed dismally.

HTML5 was forced to recognise this, making practical concessions to the reader such as rowing back on strict XML conformance by recognising the value of browser compatibility with old web pages written in earlier versions of HTML. But, for new pages, it has pursued its machine-oriented mantra with the same old blinkered and unworkable fanaticism. CSS has also pretended to much the same, and has similarly failed – in this and other ways.

Yet to challenge the ancient and outdated mantra is to invite lectures, anger, flying O'Reilly classics, prejudice and hypocrisy of a truly religious depth and fervour (I succumb to one overt mention below, for insensitivity to accessibility for the disabled is one issue I cannot forgive). I actually get more angry rants over this than I do over my views on religion or politics. In truth, this religious fervour is probably the thing about HTML5 that sucks the most, because it prevents everything else from being recognised and fixed.

Stephen Hawking famously speculated that the boundary condition to the Universe is that it has no boundary. Similarly, but sadly not speculatively, the only standard feature of HTML5 is that it has no enduring standard features any more.

Do not think I am alone in all this. Especially, here is a "discussion" about whether HTML5 Sucks because "Under the cover of standardization, it brought us the exact opposite". To be honest this one is easily fixed. Just freeze out checkpoint releases – dot releases, say – once in a while, so that both page and browser coders who need to can know which novelities will endure and which may be safely forgotten. You used to do that? Oh....

And here are a couple of moans about CSS:

Let this vain heretic offer you a few more examples:

Tabular layout

HTML Tables layout is reserved for organised data, and using it for page layout is a Cardinal Sin. Nesting tables is even worse, a heresy of Satanic proportion. But what if I want a table-like page layout? By this I mean "table-like" in the sense of block-based. This kind of block-based layout is very common in organisational "house styles" such as those used by the UK Government – my meat-and-drink for over twenty years – where it often has to work in printed A4 page format, too. You might think, "well, stick to PDF", but don't forget that HTML5 is the W3C Master Plan to replace such unfortunately practical inconveniences, it's not just a way to get journalistic content online. And even Government Departments have moved online and regard the web page as the primary information medium, with PDF reaching the parts, such as properly formatted printed documents, that the web can not.

To be fair, back in the day some really crap GUI web tools, such as early versions of DreamWeaver and FrontPage, minced up web pages into relentlessly nested tables with a myriad tiny cells until the poor web browser overloaded its horrible-tangle stack and crashed. They even diced up images so that each fragment, now an image in its own right, fitted into its own little cell. Such table-based automata were pure digital hideousness and "Don't use Tables for page layout" very properly got dinned into every rookie design-tool codie for a generation. But, as we shall see, the users of those tools are not the creators of the tools and the baby got thrown out with the bathwater.

Meanwhile, what are we supposed to do instead? Why, use CSS of course. It has lots of convoluted toys to create page layouts that dance to the tune of the user agent. But not, alas, to the tune of the content author. Content dances around at the software agent's pleasure and the poor reader sees only an unpredictable reshuffling of the author's original vision. Even assistive technologies get thoroughly stuffed in their desperation to second-guess what all those arcane CSS properties are intended to convey in terms of sequential presentation. More on that in a minute.

Meanwhile, what about the inverse problem, if we have tabular data but our code generation tool was written by a good little droid and conforms to the layout mantra? Why, how kind, the CSS table model provides substitutes for all the missing table components. The rowspan and colspan substitutes are implemented in execrably different ways but they are at last there. Up until recently we had to precisely oversize individual cells to spread across their neighbours.

Yes indeed, you can now use a styling language to mimic the semantics of a data table. Having been told solemnly not to use semantic markup to implement styling, we have instead been given a semantic implementation baked into the styling markup language. This direct reversal, dear souls, is self-defeating to the point of being utterly mental. Instead of unconscionable old Web Devils like me styling page layouts using semantic tables, it is somehow acceptable for HTML5 Angels to code semantic tables using a styling language. Whatever happened to the rigorous separation of semantics, content and style? Oh dear.

And of course, we couldn't let CSS Tables go without borking the implementation, could we? The HTML table-wide cellspacing and cellpadding must now each be implemented for different elements, the former for the table element and the latter for the td and/or th styling, while any local rowspan and colspan equivalents are an utter nightmare to implement. The poor page creator must struggle with the most complex hierarchies of presentation modes and sizing algorithms for the table cells as they dance to the tune of that relentlessly literal-minded user agent. Meanwhile those legacy oversize cells are still out there and, not doubt, so are the tools which spew them out. Oh, joy! Oh, rapture! as the great Stanley Unwin would have said.

While on the subject of tables, Wikitext (the markup language for Wikipedia) has a brilliant feature in sortable tables. Just click the icon in the column heading and the row display order is sorted on that column. It is incredibly useful when viewing big lists with lots of information fields for each entry. Many of you will have long used it in spreadsheets almost unconsciously. When you stop and think, all it does is restyle the existing information, it can hardly be described as programmatic. Yet the Wikitext implementation requires javascript, as nether HTML nor CSS has the necessary capability. Such a simple viewing tool really should not rely on a coding language to deliver it. All it needs is for something like:

<table sort="columns">

or

<table style="sort:rows;">

Frames, Grids and Stateful URLs

Many publishing pros, especially those producing magazines, handbooks and other documents with strictly-defined house styles, will tell you how important frames and similar blocking-out elements are for effective page layout.

Early versions of HTML offered you a frame element to position a subsidiary web page in the main one. A frameset defined a layout incorporating multiple frames, which allowed a site to build a page from standard blocks without recoding each block in every page. You could even click through subsidiary pages in a given frame without disturbing the others. The problem was, the web browser could bookmark only the main frameset page, it didn't record which sub-pages populated the frames. So if you clicked through a few pages in one or two frames, the browser had no record of the page state you actually wanted to bookmark. And those framesets were getting so plumb popular, they were everywhere – the web was becoming un-navigable!

So did our folks improve the Favourites (aka Bookmarks) function to include every page in the frameset? Did they make urls stateful? No, of course not. Instead they decided to hate frames on principle, to just burn the whole thing and save having to think straight. Framesets were banned. That, my friends, was the moment when the W3C wholly lost touch with the professional page designer. And wow! It was a long time ago now, deprecated in HTML 3 if I remember rightly, wholly ostracized from HTML 4.

The iframe survived the cull, as a kind of stopgap media container until the real thing actually worked. Somehow it clung on and has even been regaining some of its popularity. Using it, you can create a window at some arbitrary place in your page, into which you can pour another page. But iframes are not well behaved – the window is difficult to adapt to circumstance. Each sits wherever normal flow drops it and to align multiple iframes the HTML5 designer must resort to floating and/or layering containing divs all over the place, managing how they all jump around and how the iframes they contain might overlap each other.

You can't even size an iframe to its content, so if you have different sized iframes you will have to size all their containers individually. And if the content grows, then you must use javascript to guess at a new size for the container. It all makes an utter cabbage of the way professional designers need to go about their business. The embed element is a bit like an iframe for media players to drop stuff into. You can even drop in a web page (MIME type text/html), making it effectively an alternative to the iframe, but it has pretty much the same layout problems.

But then, riding to the rescue comes the CSS Grid, a genuine way to do table-style page layout using a styling language. However it has serious problems. Firstly, it does not directly link to off-page content the way frames did and iframes still do. Secondly, layout should be part of the main markup lamguage and not a styling thing; if we are to use a grid, it should be an HTML grid not a CSS one. Thirdly, it does not lend itself to useful DTP features such as circular and other arbitrary-shaped frames. It is a brave effort at getting away from some of the worst dance moves, but it misses the root problem.

Another showstopper is interactive pages that change their state as you click around, maybe loading a new page or media file into some iframe or media container. The user may want to bookmark that particular state so they can return to it. One could make bookmarks stateful, so they record each of those subsidiary files and can call them all up. It would be cleaner to make urls stateful in this way, so bookmarks don't need to get clever. Yes you can attach state data to an https "url", but it has to be passed on to some back-end processor to be digested and the content served back. or you can rely on good ol' client-side javascript to do the heavy lifting. Either way, it is not inherent in the web technology and adds an unnecessary Turing-complete attack surface. Stateful urls would be understood by a simple reactive front-end web server, and thus cut out that extra layer.

What we need is actually really simple: tables for tabular data, frames which resize (or not) as specified to fit the content for page layout, and stateful urls. Let me repeat that. No, really, it's so simple I don't need to. That's all. W3C or WHATWG or whatever, just shut up about Seven Hells for the Unworthy and go do it (but see also below).

On another tack, linking between pages can cause problems with iframes. Sometimes, I want to insert generic content via an iframe, but include a link which acts on the containing page. For example I may implement a navbar as a sticky iframe, then want to place a "Back to top of page" button at the left hand end (why do sites and browsers never do that? How often have you hit Reload but the browser kept its current focus near the bottom, so instead you had to scroll, scroll, scro-o-o-ll to get back up there?!) But a boilerplate link inside an iframe will only go to the top of the iframe, not of the containing page. For it to work from inside the iframe, it would need to be either javascripted or customised for each page, both of which defeat the whole point of re-usable content in an iframe. I have evolved a workaround: leave the navbar corner empty, code the link directly in the main page, and use the CSS position attribute to make it stick over the blank space. Not nice, as it adds boilerplate to every page, but it works – as here (try it now). And not nice of HTML/CSS to force such a nasty workaround. It should enable links in an included page to work in the main page (or, if it does and I missed it, it should be a darn sight less obscure about it).

Normal flow and accessibility

The CSS specification has a concept of "normal flow", in which content is presented in the order it appears in the sequential flow of the source page. At first sight this appears to be inevitable and trivial – surely any web page is displayed in the sequence in which it is written?

Unfortunately, CSS also has the ability to break normal flow. For example a right-aligned floating box has to be inserted in the page ''before'' the content displayed to the left of it (unless you know the exact vertical offset for the particular user display - hello javascript). Thus, the earlier block of content is displayed ''after'' the subsequently coded block and to make it intelligible the source content has to be swapped round, breaking the normal flow.

Ten years or so into the new millennium I got into an argument over whether I should use HTML tables or CSS to lay out a certain page. I was loftily informed that tables, and especially nested tables, break accessibility tools such as screen readers for the blind. So I coded two examples, one using nested HTML Tables and the other CSS. The CSS version, in order to display properly, was forced to break normal flow. Then I asked a blind colleague to see what he could make of them. His assistive screen reader coped flawlessly and utterly transparently with the nested tables. It would always tell him if it encountered any data-containing tables or got a bit worried about them. This time it knew not to bother. When I told him the HTML was not only tables but nested he was astonished – his reader had given not the slightest clue that there were any at all and had happily sailed right past the issue. The CSS page fared less well. The scrambling of normal flow was read back to him verbatim, rendering the content unintelligible. Would our corporate web developers kindly update their guidelines in the light of an evidence-based formal request? Not likely, the religious fervour of the mantra was thrown, with no little malice, back in both our faces, I have to say really quite rudely. "So much for equality in the workplace", my blind colleague explained with heavy resignation and no surprise whatsoever. Galileo was wrong, Jupiter's moons do not go round it. Stop wasting our important time and go do something else, if you really know so little about theology. This, I might add, in a Government Department renowned for its enlightened stance on equality in the workplace. Imagine the others! Actually, you probably already work somewhere like that, you poor soul.

In 2020 I had a chance to quiz another blind surfer, who used a reader called Jaws. He reported that:

"I have to wait upwards of a minute or two to give the page a chance to finish rendering before Jaws can begin reading it to me else the CSS bullshit causes Jaws to restart the page every time it updates itself.

"Situations like where there is an image on the page (my browser doesn't load them, it just tells me that there was an image in certain places) and the text leading up to that point said one thing, suddenly Jaws has a shitfit & starts reading the page all over again -- when it gets to where that image had been the first time the text near it says something completely different.

"Another irritation is text hidden until you press a button to reveal it - in order to figure out that the previously hidden text is now available I have to force Jaws to reread the whole thing over again. Not TOO annoying if it was only a paragraph or two, a whole 'nother kettle o' fish if it's buried in the middle of a page that has the infinite scroll bullshit."

In other words, no progress worth the name in the last ten years. So much for lip service to accessibility.

Is normal flow a semantic or a stylistic concept? Well, if we break normal flow and present the second part of a page before the first part so the reader encounters it first, as the blind assistive text reader did, I think you will agree that it has broken the semantics. What's more, it was the styling language (float property) that broke the semantics, oops!

But now consider a centrally-focused layout with boxouts all around. Different people respond to different visual cues. Some will home in on the prettiest graphic, others on the boldest text, others the top-right corner (or whatever their national linguistic convention). A good layout caters for all of these. A typical reading paradigm (hardly a "pattern") for such a page is to start in the middle and then move progressively outwards in whatever direction meets your interest, possibly in a wildly zigzag path. At each node, when the reader is ready to move on, the eye scans the surrounding area in whatever mode is most appropriate to the reader's anticipation: more detail, a peer node, back to the core? This can be very effective in, say, a stakeholder analysis for a large and complex project. We may happily code the boxouts in any order because the reader is not in this case expecting a normal flow anyway. This time, breaking page flow does not break the semantics.

This helps to illustrate how the relative positioning of objects is sometimes a matter of semantics and sometimes a matter of style. Page layout cannot be cleanly dumped in either camp, and separating the two into distinct languages makes more cabbage for the poor page designer.

Consider a page comprising a diagram whose physical layout is determined by presentational elements such as frames and SVG graphics. This layout defines the semantics of the overall structure, with detailed semantics added using html markup. There may be an obvious normal flow, a master path through the diagram, but there is no sane way in which this can be hooked into the code design for all that positioning-cum-semantics salad. What the assistive reading flow needs is a human-centric markup, a way of tagging elements sequentially so that a reader app will process them in a comprehensible order. One way to achieve this would be to create a flow attribute which may be added to the containing elements which string together to create the normal flow, for example:

<div flow="17">My content</div>

Any rendering agent now knows the logical order in which to present the content and can check out the other styling attributes for the niceties.

Div or diva?

We have seen there is a distinction between the stylistic and semantic aspects of positioning content in the page. The HTML <div> and <span> elements are curiously ambivalent constructs. Being part of the semantic language specification we should expect them to contain a set of semantically related items, and sometimes they do. But they have no inherent semantic of their own, in fact they were deliberately introduced to function in a wholly contradictory way, to stylistically present the contained items as a visual grouping, at respectively block and inline levels. In this they are merely empty placeholders for the accompanying CSS and are typically used as such. But often the purpose of that accompanying CSS is to distinguish the div or even span content from the surrounding content, as a way to implement the semantic function of a content separator and container. For example a semantic equivalent of <em> or <strong> may be implemented by decoratively styling a phrase within a span. In other words, the div and span are a dog's breakfast of elements. They are both a styling convenience and a semantic one because it is not possible to cleanly separate the two functions. It is essentially this ambivalence which has created the ability, of the div especially, to break normal flow.

One of the nastier things about relying on divs is that it is often necessary to nest a fragment inside a div, before it can be styled the way you want. For example the <img> element does not accept all the CSS positioning parameters, so it has to be wrapped in a div before it can be properly positioned. Arranging several images together then requires a complex nest of divs. It would be so much better if the image markup itself could accept all the necessary positioning code.

And the div has an even darker side. It has begun taking over the role of nested tables as the horror of automated page building. Everywhere on the web today you will find vast, complex nests of <div> elements vying with each other to misplace the closing </div> tags. Classes abound, with CSS calling up such cascades of burgeoning stylesheets and JavaScript doing the same, that the chances of a class name conflicting across two sheets rise exponentially, while the chances of finding and fixing the problem without some superficial kludge fall similarly.

A fairly trivial issue with these relentlessly nested divs is that 90% of the web pages I come across fail to close them all. Anything up to half a dozen or so may be left open, for the user agent to kill when it reaches </body>. Some miss out intermediate closing tags, leaving a right mess for the rendering engine to untangle – not always successfully. You nasssty little hobbitses, you. OK, I know it's a pain to program for when cutting YANSB (Yet Another Site Builder), but it's your mantra not mine, alright?

In all this mess, normal flow is thrown to the winds. The poor assistive reader is left in deeper mire than ever nested tables could achieve. The almost universal tendency these days to automated code salad is another reason why we need a separate approach to markup if we wish to designate a strictly semantic construct such as normal flow.

The div has proved its worth and cannot simply be shunted aside just because it breaks the mantra. Far better to shunt the mantra aside and accept that HTML must embody stylistic as well as semantic functions. Oh, and those guidelines to developers, you can stop yelling "don't use nested tables" at them and start yelling "don't use nested divs". Yesterday is already too late to start. "Don't use nested divs". I can't hear you, louder! "DON'T USE NESTED DIVS!". That's better, well done. Now, go live up to it.

As an afterthought, maybe divs could be given the characteristics which I advocate above for frames and framesets. Reducing the number of layout queens by one can be no bad thing. But then, frameset = nested divs. Um. I'll have to think about that. Meanwhile, take care with that normal flow.

Content re-use and transclusion

A magical thing about Mediawiki which has already been mentioned is the ability to create arbitrary fragments of information and then to re-use them by embedding them in other web pages. They call it transclusion and so will I. It is such a magical feature that many a Wikipedia editor cannot imagine building such a usable and accessible resource without it.

As soon as I begin to code up a web site, the lack of proper transclusion in HTML5 becomes painfully obvious.

The old HTML frames were a crude attempt at transclusion of a sort. I use the surviving <iframe> element for the site navigation on this blocki, primarily so that you don't need to enable javascript, even if its lack does make things clunky. Some of its limitations, along with the similar <embed> element, have been mentioned above.

It has been argued that resizing widths to fit arbitrary content is hard, undesirable, unpredictable, etc. etc. But wait, the <img> element has done that for images since the day HTML first hit the streets. I mean, come on guys! How hard can it be? In fact the handling of images implements many of the essentials for the way all page fragments should be transcluded.

The W3C specification is a bit dense, but it seems to assume that any included HTML content must be a complete web page in its own right. With the XML influence fed in through XHTML, we gained the idea of data islands. This is content which may or may not be HTML but is nevertheless embedded in the web page. But it too must be wrapped around with semantic definitions which make it in effect a complete page.

Wikitext markup is different. You can write a fragment of anything and transclude it into your page very easily. A navbox perhaps, a notice or a fragment of a table, say its header with a row of column headings. And you can add content to your transcluded item, say the message in the notice or the title of the table. The basic code format goes something like this:

{{transcluded file name|insert1=text to be inserted}}

Which in HTML might be defined for the sake of argument as:

<transclude complete="off" src="transcluded file name">
   <insert id="1">text to be inserted</insert>
</transclude>

where the "complete" attribute defines whether the source is to be included unmodified, containerised by closing off any HTML elements left open, or perhaps even topped-and-tailed to form a complete embedded document (In Mediawiki this completion mode is set globally on the server, which can be inflexible).

Actually, my Firefox browser happily renders such document fragments in an iframe, but I don't know whether this is universal among modern browsers, or how complete a fragment must be - for example must any element opening tags in the fragment be closed off in it too, or can that be left to a different fragment? Certainly, it is progress in the right direction.

The Wikitext approach also differs technically from HTML in that the transclusion is done server-side while the embedded HTML, be it in an iframe or embed element, merely links to the extra content so that it must be downloaded separately and the transclusion is then done by the client web browser. This has the effect of making any HTML implementation more limited than Wikitext because by its nature one cannot break the hierarchical nesting of opening and closing element tags once the page has been served, whereas crafty wikitext coding can play all kinds of games to insert, delete or modify tags.

MediaWiki has a further, powerful refinement in that you can treat the transcluded code fragment as a template and pass parameters to it from the main page. For example you can design an infobox template or a warning box and populate it with a list of values or a custom message, saving the need to code the whole display box for each instance of use.

Content re-use isn't a functional thing, it's an authoring and display thing. In fact HTML does make a nod to it with images. You can re-use an image in different pages, merely by linking to it. Imagine having to embed the image file content in every web page it appears in (like some email formats do), well, that's where we are now with text re-use! It ought to be right up there in the page authoring language. But it isn't. Gah!

I am being a little unfair to HTML here. The server-side smarts of Wikitext have more in common with a web development language such as PHP or ASP than with the HTML those languages churn out. Nevertheless, without some such functional content re-use, HTML is badly crippled. HTML5 can only get round it by resorting to JavaScript and programmatic interaction with the Document Object Model (DOM), but there is no reason why the simple get-it-and-display-it functionality, even the more basic templating features, could not be written into HTML.

Layering and the Z-index

Back in the day, the browser just rendered your HTML wherever you dropped it into the page. Then CSS came along with its dancing divs. Suddenly, elements started overlapping like layers, so the z-index property was added. But the z-index is an obscure and unpredictable diva, with arcane rules about when it does or doesn't work, the most arcane of which are so bizarre and counter-intuitive that they are hardly known. Who would imagine that transparency or css regions could reset the stacking order root element (of course you really ought to know what that means, it's baked into the css specification – right?)

You might think that layering and content re-use are natural partners, and so they should be. But not in HTML5 they aren't. The z-index will not follow across onto a sub-page from say an iframe. For example it is not possible to pop up an iframe from within a parent iframe, and have it overlap onto the main page outside the parent. The popup iframe is visible only within the box defined by the parent iframe; outside of that, it is cropped regardless of any z-indexing or positioning. Your only resorts are in-page duplication or javascript. Not only are the working of the z-index tangled enough to make a string theorist wince, but it is fundamentally broken.

Back in the day, Netscape 4 added a layer element to HTML. As Wikipedia currently puts in, "But in modern browsers, the functionality of layers is provided by using an absolutely-positioned div, or, for loading the content from an external file, an IFrame." Perhaps abandoning layers was a retrograde step. On the other hand, those layers also had a z-index attribute; you could all too easily specify contradictory values in the attribute and the associated stylesheet, so not everything about them was sane.

The SVG vector graphic format (see below) is part of the HTML5 specification but does not recognise either z-indexing or, perhaps surprisingly for those who have used WYSIWYG editors, layers. It achieves a similar effect by moving the foremost graphics to the bottom of the page, so that when rendered they overwrite the pixels behind them. That is probably not a good idea for html, as an assistive reader could all too easily blurt out text from the overwritten bits. What this last problem does highlight is that explicitly hiding an element can be as valuable as plonking one on top of what is there already.

It is high time that layering and overlays were revisited. The first question is, should they be implemented as an html attribute (per Netscape) or a styling property (per HTML5)? Especially when used for popups and stuff, the z-index is essentially a smart feature rather than a semantic one. Therefore, for reasons discussed below, baking it into CSS was the right way to go (hey, I just endorsed HTML5 there, I bet you didn't expect that!). But adding new options to the z-index would just add a dimension to the tangle, promoting the string theory issue to M-theory. Better to introduce a new css overlay property that treats the parent page body element as an absolute root. But beware clashes between the two; implementing a new z-index root for each layer might still be a smart move.

Note that calling the property layer would be unwise. CSS already has layers, cascading layers specifically. Unlike the z-index, these layers do not control which elements are presented to the reader, they only control which of all the stylesheets in the cascade wins out. And they change the way the !important override works. It's a real dog's breakfast of dog-eat-dog prioritising (to mix metaphors). No surprise then that developers are increasingly relying on automated tools to generate their stylesheets for them. But enough of this digresssion, we just need to use a different name for content layering.

The tale of em and strong

Way back when, HTML text could be italicised using <i> and bolded (or emboldened) using <b>. But this was deemed stylistic and the mantra banned it to CSS. So HTML grew <em> and <strong>, the supposed semantics underlying the styles, which could then be custom-styled using CSS.

But what is the semantic of strengthening a phrase? How does it differ from emphasising the phrase? To find out, we inevitably refer to the context in which it occurs. It may for example be that one is shouted while the other whispered, that one is a global declaration while the other is a variable value, that one is a section title while the other is an image caption (use either to choice for your table titles), or that one is used merely to distinguish it from the other in a hierarchical document outline. There is in fact no semantic set in stone beyond that inherent in the fact of a distinction between the visual styles themselves.

Funnily enough, this lack of defined user-centric semantic led to an inability among human programmers to define an unambiguous machine-centric semantic for the distinction between <em> and <strong>. Despite ever-more elaborate weasel words progressively refined to impress in a sequence of RFC specifications, it proved impossible to pin them down in any general way. To the surprise of nobody in the real world, one man's emphasis proved to be another woman's strength, and that was the end of it. The rule of <em> and <strong> never really stood a chance. We need <i> and <b>, they can never go away.

The primacy of the <i> and <b> stylistic elements has once again been acknowledged in HTML5. They are no longer deprecated, as the fanatical High Priests once did. Meanwhile <em> and <strong> remain useful to the data codie who needs to differentiate the two but has no knowledge of the intended rendering (somebody else will still have to make that mapping in the CSS stylesheet, which is to say almost certainly whatever the browser default is). The two systems now live uncomfortably alongside each other, the worm you cannot kill in the organic apple. HTML5 may be intended as a coder's paradise but the truth is, time and again the hard facts of life force it to break its own mantras. And even when it tries not to, there are plenty of real people out here with the strongest of practical motivations to do that for it.

MathML and SVG

HTML is not alone. It can be expressed as a dialect of XML, the original umbrella language invented specifically to enforce a separation of semantics, content and style. Two other dialects which deal with user presentation include MathML and SVG.

The mathematics markup language MathML has the problem of meaning vs. styling, aka semantic "content" vs. symbolic "presentation", in spades. On the one hand the graphic symbols used for a particular mathematical idea vary between national and historical conventions and even particular disciplines, while on the other hand a given symbol may have quite different mathematical meanings in all these various conventions. For example the decimal point is variously positioned at different heights or replaced by a comma, while the period represents a decimal point in some conventions and a multiplier in others. The separation of semantics and style into different languages proved totally unworkable and so both are explicitly included in one and the same language specification, MathML. The page author may choose to describe a given equation in terms of meaning, which helps diverse readerships who can call up the stylesheet for their local convention, or the author may find it more convenient to specify the presentation so that different copies, perhaps including printed versions, will look the same to all readers. Sometimes authors resort to mixing both content and presentational styling to cater for the more awkward situations. Sadly, being a solution munged together at the end of a long and agonising journey to a dead end and back, MathML has ended up about as human-readable as C++. The option of coding it semantically and then automatically squirting it through an XSLT for each local convention is not one which the average busy mathematician relishes.

But in the end, the labyrinthine syntax of XML has proved MathML's undoing. It is not as clear or concise, which is to say neither as human-readable nor as suited to machine processing, as other markup languages such as LaTeX or Mathjax and I have hardly ever seen it used.

And that's not the worst of it. Remember the mantra, page entities for the content meaning, CSS for the presentational style? Well, HTML5 only requires one of the corresponding MathML subsets to be understood and rendered by web browsers. Guess which one it is? Oh, if you thought for a moment it was the semantic content you really have not been paying attention. The joke is, of course, that HTML5 specifies the mantra-breaking presentational markup. That is actually a good thing; just as XHTML 1 failed in its bid to "do two things well" and was split into human-oriented HTML5 and machine-oriented XHTML 2 (dropping the human-readable space from the wrong moniker, bless their cotton socks), so too it would make sense for the two faces of MathML to part company. The various mathematical symbol conventions are after all a bag of human languages and dialects like any other. For example HTML does not attempt English-French translation but leaves you to go find a translation engine; MathML should do likewise (even if that engine were no more than an XSLT).

Scalable vector graphics SVG is even more obviously unable to separate "meaning" from style. Its whole job is to present visual objects in styles which convey information, and in a freestyle manner in which the meaning is evident from other visual cues of which the processing engine can have no knowledge. Yet as an XML dialect the mantra of separation survives. Things like positioning, shape and size are deemed semantic but colour and thickness are style. Sure, such an arbitrary division can be enforced and given to a machine to render, but it is not the way people work. And as soon as you bend your SVG image to the way people work you find the mantra to be a horrendous mess. For example sometimes it is the colour which carries the meaning. You need to change the colour of your discs? Well, update the CSS stylesheet. You need to change the size of your discs? Oh, that needs the SVG objects themselves hacking. Yet you are explaining how traffic lights work – the size of the light is just a stylistic convenience to fit the page, while the colour carries the semantic meaning of the light. The language has it back to front on this occasion. No, a meaningful separation of semantics and style is simply not possible in SVG. It is, if anything, even less human-readable than the equally labyrinthine but at least fundamentally rational MathML.

These languages are particularly significant here because they are baked into the HTML5 specification: any compliant user agent must be able to render all three. In effect they deliver exactly the inseparable blend of semantics and style that I have been banging on about all along. And that is a Very Good Thing because otherwise HTML5 would be unusable. But really, what's with the ludicrously broken and unenforceable mantra supposedly underlying it all, huh? The whole job could, and should, have been done so much more cleanly if folks had just ignored it from day one.

Active and passive: Cascading Simple Smarts

Then there's the active stuff, the smarts, that we do with JavaScript. HTML and CSS are the passive languages, the safe, secure languages, right? If you want anything actually doing and can tolerate the inherent insecurity of downloading arbitrary executables, then JavaScript is your friend, right? Oh dear again.

Well, on the one hand HTML has hyperlinks and forms, both of which can perform useful interactions and neither of which needs JavaScript or even CSS. "Oh, but I mean programmatic interactions of course", cries the thrower of the books, "Hyperlinks are just dumb navigation".

OK, so on the other hand we have the drop-down menu, another dumb navigation widget. Now it is a funny thing but almost every element of the classic windowing user interface – buttons, checkboxes and the rest – can be created in HTML alone, with the sole exception of the classic drop-down menu. You can use a mouse to click on a widget or a hyperlink, sure, but you can't use it to open a drop-down navigation menu, whether through hovering or clicking, without some crafty CSS3 or JavaScript. Like it's a magic smart thing that needs special treatment, just to be able to hover a mouse and display a list – in the right place – that is already written into the page, yeah? Yet an HTML form submission has to pull together a long response string from the user-set widget states, that's way more programmatic than just displaying or hiding a list that is already there. No, when it comes to HTML vs. JavaScript, the mantra has been borked both ways.

This matters a lot because SECURITY - surely even you must have noticed the fuss about online security these days. Having to enable an inherently insecure environment such as javascript, just to navigate a page or sort a list or something, is appallingly unacceptable security design. If we want to secure the web, we absolutely have to move away from the idea that "This site requires javascript to be enabled in your browser". Now I know your Big Data paymaster yells "Heresy!" at me for that, but don't let that fool you. The data slurpers fill your favourite site templates with their javascript and cross-site scripting links for one reason, and one reason only, to suck away at the user's privacy. You know that, even if Big Data are paying you to forget. A lot of us these days disable javascript by default, and ultimately that rests on the W3C for making web standards so damned insecure.

Then, there are the CSS3 tricks such as conditionals and custom variables, with more such tricks seemingly added every year. I am sorry, mantra-chanters, but these are unarguably programmatic as opposed to passively stylistic. And you still keep adding these to CSS despite specifically creating it to stay clear of the programmatic stuff in the first place! Sheesh! I won't even begin on the incestuous nature of the HTML "class" attribute and the way that JavaScript and CSS fight for ownership.

The simple truth is, even a passive web page needs some basic interactive navigation features in its markup language and HTML5 falls short. On the other hand, security demands that both HTML and CSS rigorously avoid programmatic smarts. The distinction between dumb in-page presentational interaction and smart wider-system programmatic interaction really needs to be sharpened.

Yes, there are three key aspects to any web page – semantics, content, style and smarts – no, there are four key aspects.... This is something that HTML5 really, really needs to get to grips with at the most fundamental level.

One might even try to add a fifth aspect. Some basic smarts are safe and simple enough to be active by default, even if the user wants javascript disabled. These include user interactions such as form-filling, media playback and display options via the less sophisticated CSS tricks. There are probably a few more. It makes sense to keep these out of the mundane styling language but still available independently from the main smarts processing. One option would be a kind of intermediate scripting language, between CSS and javascript. But five levels, do we really want all that? Especially when semantics and style are so inextricably intermixed?

A more user-centric alternative, focused on the HTML changes I am suggesting here, would be to move all the basic CSS style features back into HTML where they originally came from, allowing CSS itself to be repurposed as an intermediate user interactivity language. CSS might then be recast as "Cascading Simple Smarts". I certainly think that's preferable to YAWSL – Yet Another Web Scripting Layer.

One way to draw the line between HTML and CSS could then be to draw the same line as for printing off a web page. In this process, all the interactions get dropped and only the visible display gets rendered. Keeping HTML for the visible display and CSS for the interactive options would be a very clean way to implement the print API. Certain display interactions involved in form-filling might still need to be implemented in HTML, although the form submit function should be restricted to CSS.

However there will still need to be some overlap of the basic styling features, particularly inline. On the one hand I have argued that it is absurd to break from one language to another just to insert text emphasis and the like, so inline styling should by and large be included in HTML. On the other hand it is equally absurd to exclude in-line styling from site-wide cascading properties, so it also needs to remain in some form of stylesheet. This could be an HTML stylesheet, as say mystylesheet.htms, which sets values for HTML attributes using an HTML-like syntax, thus replacing basic CSS functionality in a format more familiar to the HTML author, while also saving the need to add endless complicated workarounds to the CSS standard.

It would be useful to introduce a more descriptive <smarts> HTML element and associated smart inline attribute to accompany the repurposing of CSS, and deprecate the current "style" ones. The latter might be extended to recognise the HTML stylesheet syntax as well, or it might be better to add, say, a <looks> element and look attribute for HTML styling.

And ninthly...

Well, eleventhly really, but who's counting any more (certainly not the HTML5 release CMS). While I am here, it is worth mentioning the odd minor idea/niggle.

Contents lists can be useful for structured pages, both for the mobile-oriented habit of stringing multiple short viewing "pages" into a single large html page and for longer narratives which break into multiple sections. Similarly for listings of images, embedded media, bookmarks, etc. Currently these have to be hard-coded or built server-side (or, I suppose, built client-side via javascript). It would be nice to have an HTML contents element which the web browser can automatically populate. This seems a perfect application for a revamped Simple Smarts CSS; it would dynamically build the list when you load the page. For example, to create a clickable list of heading1 through to heading3 elements:

<contents style="list:heading 1 3; links:yes;">Table of Contents</contents>

List bullets can be custom images. Nice. But these images are often not vertically aligned in the right place and they cannot be repositioned. Not nice. The standard workaround is to place the image in the background, but this brings a minor curse of its own. Normally, the list-style-image can be specified in the list styling and the list items will inherit it properly. But if you put it in the list background then it does not get inherited by the list items. So a separate class has to be created for the list items and every flippin' entry in the list must have class="pretty" or whatever added to it. That is just additional clutter to maintain, it need not be this way. In fact, being able to specify all of the list item styling in the containing list element would be a godsend, be it inline or in the stylesheet, for example with a list-item-class style attribute, as in:

<ul style="list-item-class:pretty;">

Box width auto settings can behave badly. I often want to align a narrow box of content centrally within its container. But the width of the box is content-dependent and the content say left-aligned. A short in-page notice or a contents list are typical examples. The current CSS system of using align:auto; is counter-intuitive and clunky. It is also inflexible if one wants to refine the alignment with a slight offset (as any good typesetter may). Cleaner, though no more flexible, is the HTML align="center" attribute. But the problem with aligning say a div in this way is that the CSS width:auto; defaults to 100% container width. There is no simple and intuitive way to size the div to automatically fit the content (as there is for height – and even for images, remember?), so extra code has to be added for that. At the very least the width attribute needs an explicit variable, for example "fit-content", as in:

<div width="fit-content">
and/or
<div style="width:fit-content;">

Tab stops were a feature of Wilbur (HTML 3.2). But they didn't make much sense in resizable web pages so they got dropped. But HTML5 has increasingly grown special goodies to paginate web content for the printer. So now any self-respecting author wants their tab stops back, just like wordprocessors have. We are supposed to use divs, to position them to align like tab stops. Happy? Err, no! As a self-respecting Web Devil I would rather burn in hell than wrestle with the necessary CSS to try and make such divs behave properly. When I hit a need for tab stops the other day, I resorted to an HTML table; its colspan attributes were so much simpler and quicker to code than all that div madness. We really do need tabs back, say a <tab> element, with the ability to set the stops for a paragraph style in CSS, along the lines of:

<p style="tab:left 8em; tab:centred 50%; tab:decimal 100%-12pt;">

HTML 6

So there you have it. Time and time again the distinction between semantics, content, style and smarts is flouted as if it didn't matter a damn, even where it isn't irretrievably broken, while at the same time it is ritually pronounced as the reason Why Things Must Be Done The Way They Are. What can be done to fix this broken standard that is HTML5?

Why can't we go back to something more human-centric and start again? This has happened in a minimal kind of way, because although all the old styling attributes have been pulled from HTML creation standards over the years, web browsers still have to cope with ancient legacy code. If I write <table cellpadding=4> today, the browser still has to deal with it properly. The world can still use all the old attributes and break the mantra without breaking the page. And, because they are often more convenient than CSS, we all do already and we will keep right on. Like <i> and <b>, all the rest is not only here to stay, but here for a good reason.

I believe that we should not be trying to pull styling elements and attributes from HTML but actually enhancing the original model with more of them and widening the values they can accept. For example being able to set the cellpadding in em or pt units would be extremely useful. They could also adopt the same cascading approach as CSS, with say the default cellspacing for all tables set in an "html stylesheet" and overridden where a local value is set.

Another fundamental change would be introduced by the idea of layout as a core feature of HTML pages, being neither semantics nor style but a way to deliver both. The status of divs can now be openly understood and the good things about both tables and frames cleaned up.

Transclusion would reinforce the need for a bookmark and other browser navigation aids to record not just url of the page visited but its internal state at the time, a topic I have not dwelt on here because it is not the fault of HTML.

So here's a summary of the changes I propose. Some of them are even quite sensible by anybody's standards:

  1. User-centric vs. process-centric semantic model to replace the semantics-content-style-smarts mantra model. If you must retain the present paradigm for processing purposes, you will need to ensure that there is a defined mapping between the machine-centric and user-centric semantics; if you have to do that as an XSLT, it serves you bloody well right.
  2. Layout-centric focus of the original HTML to be restored; recognise HTML layout elements (such as the div and iframe) as contributing a tacit but flexible user-centric semantic and expand the layout capabilities:
    • Re-introduce and update frames and framesets, as additional layout elements.
    • Extend the url specification to append state data such as destination urls of sub-pages and embedded media.
    • Introduce auto-sizing of iframes and similar to their content.
    • Extend the hyperlink specification, allowing in-page navigation links to reference the composite page as displayed, rather than only the page containing the link (or if you have already, tell me where to find it!).
  3. HTML stylesheets to be introduced, and the style element and attribute expanded to recognise the HTML format (or a looks element and attribute introduced).
  4. Active properties to be reshuffled between standards:
    • Some HTML form features to be deprecated and their functionality moved to CSS, as their interactivity goes beyond mere display or navigation.
    • CSS to be recast as a "Cascading Simple Smarts" user interactivity language, with smart replacing the current style HTML element and attribute.
    • CSS basic styling elements to be deprecated and their function restored to HTML.
  5. More CSS-only attributes to be duplicated as HTML attributes, for example:
    • Table overall border styling, independent from cells.
    • Set table cellpadding and cellspacing in em or pt units.
    • Drop-down menu lists, for example as an ml element.
    • List item formatting, including bullets and numbering.
  6. Miscellaneous HTML features:
    • Flow attribute to be introduced to enable the rendering flow to be specified.
    • Transclude element to be introduced, or if preferred the embed element to be extended, to allow seamless inclusion of text/markup fragments into a page. The ability to pass strings as variables would be extremely useful too, so that the target page can be used as a passive template.
    • Contents element to be introduced, for automated population of tables of contents, objects, etc.
    • Reintroduce a <tab> element or similar and add appropriate styling attributes.
    • Add a value to the width attribute, to automatically fit the element's content, e.g. width="fit-content" (either that or change the behaviour of the auto value).
  7. Miscellaneous CSS features:
    • Table sorting to be implemented as a CSS attribute.
    • Add a new flow property to aid assistive readers where the CSS causes normal flow to break.
    • Improve list item inheritance of styling specified in the containing list, e.g. via a list-item-class attribute, so that all the item styling can be specified in the overall list styling.
    • Deprecate and eventually remove the display:[table/row/column/cell] pseudo-semantics.
    • Add a new overlay property with simpler rules than z-index. These include: treat the master page body as the absolute stacking order root, hide elements with negative layer values, always implement the layer value specified, implement a new z-index root for each layer.

But would all this still actually be HTML5? "Of course!" cries the acolyte across the room, "HTML has become a living standard and we need never muck about with version numbers again. Go check the W3C, er, I mean the WHATWG website." Oh, get real, put down last century's Book of Common Prayer for just one moment. This would no longer be your kind of HTML, it would be my kind of HTML. And of course, as noted above in the context of MathML, we can revert to the the human-language-oriented space between the standard name and its version number, as in HTML[space]6 and abandon your spaceless "look-ma-aren't-I-sophisticated?" HTML5. And that makes it HTML 6 whatever your liturgy claims. In fact, it makes it heresy.

Here's one thing you can do, even for HTML5. Take a leaf out oif the software developer's book:

  1. Once in a while, drop off a stable version. Dot releases are a good way of doing this, though there is a strange fashion for ever-bigger counts of integer releases, which may ease your ego a tad.

That way, both page and browser creators who need such things can code to a common feature list and ensure compatibility across pages, browsers and time. Remember those clunky old things called "standards"? Yes, those. But would it still be "living" HTML5? Sorry bud, your theological contortions are not my problem.

Here, have that O'Reilly Koala book back. >WHACK!< Oops, sorry about that. Still, you do seem to need it more than I do.

And beyond?

The salad of different syntaxes and grammars for the variegated HTML5 languages remains a barrier to the learner. A single coding language, in which HTML, CSS and JavaScript functionality are respectively implemented as distinct dialects of the one common "look and feel", would be great.

The logic of dragging MathML and SVG into the HTML5 family highlights an issue of layout. Of these languages, MathML has no real concept of layout beyond individual equations, while SVG has no real concept of embedding elaborate text or maths in a graphical page. For a typical document containing text, illustrations and the odd equation, HTML5 elements must be used to create the layout as well as the text formatting. The system has its origins in historical conveniences, having grown piecemeal from the original HTML specification rather than through any principle of language design.

Is this wise? It certainly does not seem very neat. Might it be better to have one dedicated page layout language plus three separate content languages for text, graphics and maths? Each to do one thing but to do it well. One might call the text content language HTCL – HyperText Content Language. The page layout would then be defined in HTML and the content supplied in blocks or islands of the other three as required.

Suddenly, frames (or maybe divs plus embedding) vs. tables splits more obviously and cleanly, the one into HTML, the other into HTCL. You can create a complex diagrammatic page, such as the stakeholder analysis referred to above, without necessarily needing different apps to manipulate page layout and graphic. You can embed the first tenth of a fiendish quantum interaction or simplified proof of Fermat's last theorem likewise.

No doubt as the paradigm begins to happen, ways of tweaking SVG and MathML to snuggle up tighter would emerge.

Or, perhaps a fresh start with a uniform syntax and grammar throughout all the layout, content, styling and smarts languages would be a better idea. I chucked up some basic thoughts on panscript, years before even my original whinge here. Maybe I should go back and warm it up again.