Design Principles of XML

XML is in principle transparent. It is written in generic ASCII code. Its ability to handle global commerce (requiring the use of many alphabets and special symbol sets) is strengthened by its provisions for handling Unicode, a standard international-character encoding scheme.

A lot of attention was devoted in the early stages of XML’s evolution to tidying up inconsistencies and closing loopholes in previous markup languages. Attention has also been given to reconciling XML with modern hardware and operating systems; SGML was written for the mainframe era and still bears some evidences of these origins.

Syntax: Elements and Attributes

The syntax of XML is predefined and relies on two basic categories of information–elements and attributes–used in earlier markup languages. An element in XML is a basic category of information. Elements in a description of music might include notes, chords, and so forth. An element must have a name. It may contain attributes, dependent subelements (still called “elements”), and textual content.

An element might read

            <note name= "c" octave= "1"/>

clear

The simplicity of this statement illustrates the principle of immediate comprehensibility: we would have some idea of what the statement above meant even if we had no formal knowledge of XML. As in the syntax of HTML and other markup models, elements are enclosed in less-than (<) and greater-than (>) signs. In the above construction “note” is called the tag or element name.

Next in order come the attributes of the element. Each attribute has a name (here octave), followed by the equal sign (=) and a value (here “1“).

If an element has no “children”, it ends with a forward slash (/) before the closing sign (>). Note that the indicator meaning “no children” is part of the syntax.² Thus, no knowledge of the data-model is necessary to know if a concrete element has children or not.

XML syntax has a hierarchical structure that fits well with basic concepts in musical notation. For example,

            <chord>
               <note name= "c" octave= "1" />
               <note name= "e" octave= "1" />
               <note name= "g" octave= "1" />
            </chord>

clear

In this case we have a chord with three notes as children. As in all parallel constructions, the statement without a slash (<chord>) is called a start tag. The statement with a slash (</chord>) is an end tag.

In hierarchies, start tags and end tags must always be nested. For example, the sequence

            <measure>
               <chord>
                  <note .../>
            </measure>

clear

would not be allowed, because a </chord> tag is missing before the </measure> tag.

Semantics

Beyond the syntax, we can define that in a valid music notation document any chord element must have one or more note elements as children. We can also define the attributes that may or must occur in a note element and we can define the values that an attribute may accept.

When we do that, we are talking about an application of XML. We now see that XML is really a meta-format that defines the syntax. It has implications for the way in which we define the semantics. The syntax is strictly separated from the semantics that we define ourselves.

Definition of the Semantics

A Document Type Definition (DTD) in the initial version of XML dealt with the expression of the semantics. It served as a contract between the XML writing and the reading software. Anything that can be expressed with a DTD can be expressed with what is now called an XML schema. Eventually all DTDs may be replaced by schemas, but applications based on DTDs, which are upwardly compatible with schemas, are in no danger of being made obsolete by schemas.

²We use the word child to describe secondary elements.

Prev Next