ideas & ramblings

Mathematically optimal HTML

(Originally published in 2010.)

One of the most underappreciated things inside of HTML5 is the standardization of browser’s parsing algorithms. Historically, each browser handled broken/incorrect HTML in slightly different ways, leading to unexpected results depending on how the browser decided to interpret the code.

Now that all of the major browser manufacturers are adopting an identical parser, invalid HTML will be parsed in the same way by each of them. If, for example, you omit a closing </p> tag in your HTML, you’ll know with certainty exactly how all of the browsers will react.

Theoretically, we can take advantage of this to compress our HTML – emitting the bare minimum amount of markup for a desired result. We’ll be able to rely on implicit elements added by browsers when we omit them – <html> and <body> and the like – and on their behaviour when recovering from incomplete code.

Google.com already makes use of this strategy to a limited degree; their HTML is notoriously concise and occasionally cringe-worthy, but it works, renders as they want it to, and saves them countless bytes on each page view. If you’ve got a popular enough site, those bytes add up.

As the HTML5 parser becomes commonplace and is better understood by clever developers, I fully expect to see HTML packers joining the JavaScript packers we’ve been using for years, emitting various flavours of heartbreakingly awful HTML that gets the job done.