SmartMusic Finale Garritan MusicXML

MusicXML Score DTD Examples

The MusicXML score DTD is still under development as it is tested with more music, music formats, and musical applications. The complete DTD can be accessed via Some examples from the current version illustrate both the level of detail in MusicXML and some of the standardization issues that arise when defining an XML interchange language.

In Figure 2 we see how a note element is defined in the current MusicXML score DTD. XML internal entities, such as full-note and voice-track in the following example, are equivalent to macros in other languages. The note definition is based on the layout of note records within MuseData.

Figure 2. Definition of a note element in a MusicXML DTD.

            <!-- Internal entities to simplify note definitions -->
            <!ENTITY % full-note "(chord?, (pitch | unpitched | rest))">
            <!ENTITY % voice-track "(footnote?, level?, track?)">
            <!-- Definition of the note element -->
            <!ELEMENT note ((((cue | grace), %full-note;) |
                           (%full-note;, duration, tie?, tie?)),
                           instrument?, %voice-track;, type?, dot*,
                           accidental?, time-modification?, stem?
                           notehead?, staff?, beam*, notations*, lyric*)>


Two elements within the note definition are pitch and tie. Each element of a pitch definition is defined to contain parsed character data (PCDATA). The tie, however, is an empty element with a required type attribute that indicates whether this is the beginning or end of the tie (Figure 3).

Figure 3. Elements within the note definition.

            <!ELEMENT pitch (step, alter?, octave)>
            <!ELEMENT step (#PCDATA)>
            <!ELEMENT alter (#PCDATA)>
            <!ELEMENT octave (#PCDATA)>
            <!-- Tie is an empty element with one attribute. -->
            <!ENTITY % start-stop "(start | stop)">
            <!ELEMENT tie EMPTY>
            <!ATTLIST tie type %start-stop; #REQUIRED>


These definitions illustrate an interesting dichotomy in XML DTDs: element text is weakly typed, but attribute text can be strongly typed. Yet for most purposes, it is overall better design practice to put semantic data into elements rather than attributes (Harold 1999). Elements are generally easier to manipulate than attributes from within an XML program, and elements can have more complex structure than attributes can.

The weak typing of element text helps make XML DTDs more extensible for new applications, but puts a heavier burden on documentation and software to handle interoperability. In our current MusicXML software, pitch names and note types are interpreted using American terminology (C, not do or ut; an eighth note, not a quaver or croche). But this is due to the software. Comments in the MusicXML DTD note this current restriction, but nothing in the MusicXML DTD can enforce it automatically. This particular restriction may be removed in future iterations of MusicXML, but in general, dialect issues can arise within a DTD based on the actual content of the XML elements. One of the benefits of XML schemas is to make stronger typing available throughout an XML language definition, not just at the attribute level. This does not, however, eliminate the design tradeoff between extensibility benefits and dialect drawbacks.