More about IT >

Panscript

Panscript is the germ of a proposal to reinvent the language of the web. It replaces the current babel of HTML, CSS, javascript, PHP, SQL and all the rest with a single language for creating, processing and delivering content, whether on a digital device or in print. Given recent developments in the use of HTML as a desktop environment, even coding the entire user experience in a single language appears feasible. Since Panscript began life on the old Linux Format wiki (now archived at https://web.archive.org/web/20160413203438/http://www.linuxformat.com/wiki/index.php/Panscript ), HTML5 has borne out many of my worst fears over the arbitrary and broken code salad that is today's web. I just know that we could - and should - be doing better than this.

Issues with this page: I need to split off the specification as a separate page and add a more comprehensive table of contents to both.

Contents

The problem with page authoring
The big idea
Why "panscript"?
Roadmap
Basic concepts
Basic syntax
Element classes
The outside world

The problem with page authoring

Modern web pages are a horrible mix of languages with widely differing grammars and syntaxes. For example a typical client page contains:

Then, the source page actually stored on the server may use a different language again, which essentially tells the server how to create the page it actually serves. Such languages include:

Meanwhile it is quite common to want a document available both in printed form and as a set of web pages.

The original content itself may be created in a variety of formats, some open ones being:

Often these source documents will contain features (headers, footers, page breaks and numbering, page cross-references, deep linking to the author's host operating system and such) which are unsuited to web pages. Likewise, HTML and friends contain links and stuff which are not suited to print publishing. Re-purposing content can involve a great deal of effort.

Even in its own small world of web page coding, HTML5 sucks. It is a dog's breakfast introduced by an agonised W3C trying to impose the dogmatic separation of reality into semantics, content and style. In practice they have been comepelled to muddle them up and, through their own sense of religious crusade, refused to admit that the job is botched and vacuous. To take just one example from my linked article; we have CSS (style) positioning introduced to avoid using HTML (semantic) tables for layout. Yet on the one hand audio readers for the blind cope far better with table-based layout while on the other we also have CSS (style) based table (semantic) presentation, ostensibly for when the table semantics have been lost but in truth a simple reversal of the heresy - using style to mimic semantics. It's a horribly self-contradictory piece of hypocritical spaghetti logic and HTML5 is infested with the stuff.

It's all horrible! Uh, sorry, did you catch that? It's horrible! All of it!

The big idea

One day I got fed up with the prospect of learning yet more awful languages, one after another - HTML, XHTML, javascript, CSS, PHP, ... just to get a web page looking and working the way it should even when printed out. I thought, one language to do it all - why not? All you would need to do is learn one language, dump your stuff on the page and go. So that's what panscript is all about - one language for creating, processing and delivering page content, whether on the web or in printed documents or, hopefully, other access media such as speech synthesis and Braille.

Clearly, this language will be very rich - probably as complex as many human languages. But it should bring major benefits, such as:

Other anticipated features include:

For example an author could first learn the styling and layout markup, then move on to client-side scripting or page print formatting, already able to understand the basics of the code and needing only to expand their vocabulary a little. Or an application developer could start designing the page layout, again finding they have many of the skills already in place.

XML was the community's first stab at much of this, but it suffers from being long-winded and repetitive (full of word salad) to the point of incomprehensibility - thus defeating the original reason for having all those words in the first place. Also, because it is essentially an interpreted language, processing it is a slow business - and repeatedly processing all that repetitive word salad is badly clogging up the world of web services. It's time to move on.

Feature wish list

Many of the existing languages have some great features we wouldn't want to lose:

Sheesh! How we have to cherry-pick to get a decent feature set.

Structures and models

Different kinds of language have different ideas about how to organise information.

So one problem is how to respect and interact with all four kinds of model. If we want recursive hierarchies, for example calling a script from within a stylesheet that is itself within another script, then we will need to be very clear about these things.

A particular problem occurs with the concept of style, where the w3c community have got in a pretty muddle:

We need to understand the differing natures of all these kinds of "style", and how to relate to them. For now, notice that some come closest to "markup", others to "programming".

Another problem revolves around the identifying of different types of structural element. Here are some examples:

Verbosity

Back in the bad old days there came a time when most presentation formats, such as postscript pages or Word documents, had source code that was at best barely human-readable.

HTML was created as a human-readable presentation language. XML was intended to bring this goodness to more media formats by using richer human-readable markup, and was described by its designers as a "verbose" language. Sadly it went too far, causing two unfortunate effects:

Yet any practical web language must be human-readable. This is because the humble text editor is just too darn useful ever to die - it will always be an important development tool.

So -- how to square the circle?

Web 2

"Web 2" meant different things to different people.

To some it was the interactive web - blogs, wikis, instant messaging and YouTube. This is one of the main areas where a single development language, which is also a simple-to-learn markup language, can bring great benefits.

To others, Web 2 was the semantic web. There are two approaches to embodying semantics in web content:

Semantics can be multi-layered. Consider an XHTML (strict) element with some microformatted information, such as this fragment from an imaginary article on Sherlock Holmes:

    <h3><span class="address">221b Baker Street</h3></span>

The "h3" tag provides semantic information about the place of the fragment within the article - it is a second-level heading. Meanwhile the "address" value provides semantic information about the place of the fragment within Sherlock Holmes' life. Now, suppose we want to add some arbitrary third kind of semantic, say that the address is fictional, or that further information is available to subscribers, or ... . We need a general, extensible semantic framework whose syntax does not distinguish between levels in the way that XHTML tends to.

Also, we want to avoid the risk of snowballing complexity. Suppose I want to add markup compatible with both a "semantic web" XML standard foo and a popular microformat vBar. Here is another imaginary fragment which is trying to do this:

    <p class="address">
      <foo:FOO xmlns:foo="http://www.w3.org/2009/04/21-foo-syntax>
        <foo:address class="vBar">
          <foo:line1 class="firstline">221b Baker Street</foo:line1>
        </foo:address>
      </foo:FOO>
    </p>

But will the separate standards be able to recognise or ignore each other's markup as required? Will a vBar engine find its marker inside a foo markup tag? How do we indicate that the paragraph class references a CSS stylesheet, while the foo:address class references the vBar microformat? And so on. Oh, and fancy debugging that load of spaghetti? Call it human-readable? I don't!

So it would be nice to define a single "right" way of doing things. One thing seems clear: all those semantic elements and namespace uris wished on us by XML are just horrible (and the more they get used, the more they weigh web services down with the massive processing overload of all that verbiage). Adding semantics as properties of a single element is far neater. here's something more like the kind of structure I envisage:

    <p class="address" foo="address.line1" vBar="address.firstline">221b Baker Street</p>

And if you really want to link with your XML based web services, then you can knock up a nice XSLT on your application server and install a couple more CPU's, can't you? :-p

Other languages

No language, no matter how powerful, will ever exist in isolation. It will always have to interact with other languages. So it must be made possible, even easy, to embed "islands" in a foreign language structure, and likewise to embrace foreign islands.

Why "panscript"?

I wanted a good, memorable name. I also wanted it to start with 'P', so that the "LAMP" architecture could adopt it seamlessly (dream on!).

Pan was the ancient Greek god of shepherds and their flocks, so who better to name my language after than someone who looks after lots of similar things. (Useless factoid: Pan was also the god of popular music. At the end of Kenneth Grahame's book The wind in the willows, Pan makes an appearance as the piper at the gates of dawn. Rock fans will immediately recognise the title of the first Pink Floyd album.)

The prefix Pan-, meaning "all", is also ancient Greek, so panscript has a neat double-meaning.

Roadmap

This needs a collaborative effort. I have neither the time nor the skills to do it all myself.

I don't intend to go very deep until the basic idea and syntax have been thrashed out. Think of this page as a kind of working whiteboard.

  1. Set down an intelligible draft of the the top-level scheme of things and the basic syntax. - done.
  2. Get the top-level scheme of things knocked into shape, and the basic syntax defined.
  3. Define the definition, media.text and comment syntax and options to a basic functional level. This also needs to include links for hyperlinks, transclusion and images.
  4. Develop a proof-of-concept Viewer (possibly an XSLT / Firefox plugin or a bit of PHP).
  5. Extend this roadmap and keep following it. :-)

Current issues

Things to sort out with the draft spec:

Later thoughts

These need thinking through and working back into the page:

pangraph

Pangraph is a vector graphics language suited to hand-coding and for parsing to svg by a wiki server.

I have begun sketching out the basics without reference to the present panscript syntax. The idea is to then compare the two approaches and pick the best features to create a unified language.

Basic concepts

Foundational concepts include:

By "human" is meant the casual content creator as much as the skilled system developer.

The re-use of modular code leads to the idea of a document as a collection of disparate fragments, stitched together by some common framework. So the highest-level constructs, and where we need to start, are those that create this common framework: a language for stitching fragments into.

The syntax and grammar have to be universal, to work whether a fragment is written in the same language or another one, whether it is embedded in the framework or linked to as an external resource, and whatever kind of stuff that fragment might be.

The parallel with XML data islands should not be missed, nor should the need to be a good deal more readable and write-able.

Basic syntax

The fundamental building block of Panscript is a plain text file called a module or script.

Character encoding must be Unicode. It should be UTF-8 unless otherwise specified in the file type.

Universal constucts are confined to a basic Latin character set.

To aid in readability, non-Latin character sets are expected to be incorporated, for purpose-specific code, in locales where the human population use non-Latin alphabets and keyboards.

Modules are re-usable - any module will probably invoke many other modules.

Structure of a module

A script contains a hierarchy of elements, or objects.

Just to give a flavour, here is a simple "Hello world" example:

    [p0.1/My First script; Copyright Guy Inchbald, 2007. Licensed under the GPLv3.
    [[
      [m [[Hello world]] ]
    ]]
    My First script]>

The detailed syntax borrows a little from the MediaWiki idea of repetitive key strokes, preferably unshifted. The general syntax for a Panscript element comprises a sequence of entities:

    [class/id; properties [[content]] id]

where the entities are defined as:

White space

In general, any sequence of white space characters (spaces, tabs, returns) acts as a simple separator, as if it were a single space character. Exceptions occur for certain kinds of text content. Where the Panscript syntax shows no space between entities, white space may be freely inserted, for example the following is equally valid:

    [class /id ;
      properties
        [[
 
          content
 
        ]]
    id ]

Aliases

An alias is an alternative id for an element or class. For example if something called thingummajig.whatsit exists, then we might want to create thing as an alias for it. Then, every time we need to reference the thingummajig.whatsit, we need only write thing.

This allows our code to be human-readable, but not to run away with the word salad problem.

One or more aliases may be established for any element or class. The default id for any element is its sequential number in the script. Any other id provided is effectively an alias for this number.

Some aliases are reserved (predefined), others may be user defined.

Escaping text

Text markup code always needs an escape system so that it is possible to include reserved code characters like [ and ] in text.

It is tempting to reserve \ for the single-character escape as in \[ and \], including escaping a \ character as in \\.

To escape a string, approaches to consider are:

Element classes

There are (provisionally) five top-level classes of element (kinds of stuff) that a script can contain:

Anything outside the script definition will be ignored. This allows a script to be embedded in other kinds of language.]

The script definition

The first element in any script is its definition. This element contains all the others. The syntax for the definition uses the following values:

    [panscriptversion/Name of script; properties [[''content'']] Name of script]

Where:

For example, here is an empty script (i.e. with no content):

    [p0.1/Empty script; Copyright Guy Inchbald, 2007. Licensed under the GPLv3. Empty script]

Rendered (media) content

This is general media content (text, graphic, etc. maybe eventually audio and stuff) to be rendered by the viewing agent.

    [media/id; format [[''content'']] id]

Where:

Here is a very simple "Hello world" example:

    [m [[Hello world]]]

To create a functional script we put it inside a top-level object, something like:

    [p0.1/Example script; Copyright Guy Inchbald, 2007. Licensed under the GPLv3.
      [[
        [m [[Hello world]]]
      ]]
    Example script]

I may come back and define some sub-classes such as media-text, media-image (aliases t and i respectively). Who knows.

Executable code

I don't know a lot about programming languages in general, but here's what seems to be a workable approach:

    [executable/id; parameters [[''instructions'']] id]

Where:

Data

Data is stuff that is available for other elements, such as executables or content, to draw on.

    [data/id; format [[''data'']] id]

Where:

Passive comments

Comments are indispensible for adding helpful explanations and for hiding unused stuff.

    [comment/id; comment]

Where:

Note that a comment has no content entity, and is effectively an empty element. Any nested elements within the comment entity will be ignored. Any closing id will also technically be treated as part of the comment field, thought his does not matter. Thus, in the following element, the "[[ ]] id" is all treated as contained within the comment field.

    [comment/id; comment [[ ]] id]

This is a bit unsatisfactory, as it breaks a basic rule of grammar about the [[ ]] container. But it is necessary, since commenting-out blocks of code will often place such brackets in the comment area. Well, it's not strictly necessary, but writing [c [[comment]]] every time would be more of a pain than [c comment].]

Sub-classes

Many sub-classes of the high-level classes will be needed. The syntax is simply:

    class.subclass

You may wonder why there are so few top-level ones. For example there are likely to be sub-classes such as javascript, css, image, heading1 and so on. Why are these commonplace things not high-level classes themselves? Wouldn't that make them easier to type, too?

Well, firstly, we can distinguish for example x.javascript from m.javascript and c.javascript. The first of these will be executed, the second treated as rendered media (text) content (very useful for tutorials on javascript!) and the third is commented-out. So the developer can plug code in and out, try it out and present it to the reader and so on, and move from one mode to another just by changing a single character in the code.

Secondly, using aliases we can create javascript or ecma or js or whatever as an alias for executable.javascript. So when we want to add some javascript we don't have to write <script class="javascript"> ... </script> or even [x.javascript [[ ... ]]] but simply [js [[ ... ]]].

So along with the many standard sub-classes that will be needed, there will probably be even more standard aliases.

There might be a need to create further sub-subclasses, such as media.heading.2 or media.list.ordered, and so on. Again, aliases make such things manageable.

The outside world

Links

Links to other objects - Panscript modules or anything else - are embedded in one of the main element types. Not yet sure whether they go in as properties or content, or either depending on their purpose.

The Panscript language is designed so that paths such as high/medium/low/lower/nearly reached me/hello blur away the structural implementation - which is the filename, which the script definition, etc. I have a gut feeling that this is a Good Thing, but need to flesh the principle out a bit.]

Embedding modules in other languages

A typical code object such as a web page or a script may contain several languages. Where multiple Panscript modules are embedded in such a page, each script definition must have an explicit and distinct id. Otherwise they would all default to "0", and it would not be possible to find any given script.

Going to script 0 will always find the first script in the page. If there is only a single script in the page then you can get away with the default id, but this is not recommended in case you later come back and add another script before it.]

Embedding other languages in modules

These are embedded as the content of an appropriate kind of high-level module.

Specifying the language might be done as a property of the module, or as a sub-class. Haven't thought about this yet.

Updated 11 June 2016