Elements of MusicXML Design

To introduce how MusicXML represents musical scores, here is the musical equivalent of C’s “hello, world” program for MusicXML. Here we will create about the simplest music file we can make: one instrument, one measure, and one note, a whole note on middle C:

Whole-note middle C in 4/4 time, treble clef

Figure 1: Hello, World in MusicXML

Here is the musical score represented in MusicXML:

                <?xml version="1.0" encoding="UTF-8" standalone="no"?>
                <!DOCTYPE score-partwise PUBLIC
                    "-//Recordare//DTD MusicXML 0.5 Partwise//EN"
                    "http://www.musicxml.org/dtds/partwise.dtd">
                <score-partwise>
                  <part-list>
                    <score-part id="P1">
                      <part-name>Music</part-name>
                    </score-part>
                  </part-list>
                  <part id="P1">
                    <measure number="1">
                      <attributes>
                        <divisions>1</divisions>
                        <key>
                          <fifths>0</fifths>
                        </key>
                        <time>
                          <beats>4</beats>
                          <beat-type>4</beat-type>
                        </time>
                        <clef>
                          <sign>G</sign>
                          <line>2</line>
                        </clef>
                      </attributes>
                      <note>
                        <pitch>
                          <step>C</step>
                          <octave>4</octave>
                        </pitch>
                        <duration>4</duration>
                        <type>whole</type>
                      </note>
                    </measure>
                  </part>
                </score-partwise>

clear

MusicXML can represent scores either partwise (measures within parts) or timewise (parts within measures), with XSLT stylesheets to go back and forth between the two. One of the key insights from the Humdrum format is that musical scores are inherently two-dimensional. Since XML is hierarchical, using XSLT to alternate between the two hierarchies gives us a useful way to simulate the lattice-like structure of a musical score. In our example, we are using a partwise score. The part list is very simple, containing one instrument. The score-part element’s id attribute is an ID that is referred to by the part element’s id attribute, which is an IDREF.

The attributes element contains musical attributes of a score, such as the key signature, time signature, and clef. Normal key signatures are represented by the number of sharps or flats; the fifths element refers to the position of the key on the circle of fifths. The key of C with no sharps or flats has a value of 0; the key of F with one flat would have a value of -1. The time signature includes the numerator (beats) and denominator (beat-type). The representation of treble clef shows that the second line from the bottom of the staff represents a G.

The first child element of the attributes element, <divisions>, is borrowed from MIDI and MuseData. Musical durations are commonly referred to as fractions: half notes, quarter notes, eighth notes, and the like. While each musical note could have a fraction associated with it, MusicXML instead follows MIDI and MuseData by specifying the number of divisions per quarter note at the start of a musical part, and then specifying note durations in terms of these divisions.

After the attributes, we then have our one and only note in the score: a C in octave 4, the octave that starts with middle C. Since we have one division per quarter note, the duration of 4 indicates a length of 4 quarter notes, or one whole note. The actual printed note type is also included as well as the duration. Notation programs can more easily deal with the written type, while MIDI programs deal more easily with the duration. In some cases, sounding duration is different than written duration (as in jazz), so having both can be necessary for accuracy, not just program convenience.

Contrast the musical representation of pitch and duration, well-suited for both notation and performance applications, to the graphical representation in NIFF. NIFF is a binary format, but if we do a one-to-one translation of its binary elements to XML syntax, the whole note would be represented like this:

                <Notehead Code="note" Shape="2" StaffStep="-2">
                  <Duration Numerator="1" Denominator="1"/>
                </Notehead>

clear

The StaffStep attribute tells us that the note is two staff steps, or one line, below the staff. What’s its pitch? For that we need to check the clef and key signature, then handle any accidentals that preceded the note in this measure, then look for any accidentals in a note that may be tied to this one. All this computation is needed for one of the two most basic elements of music notation: what pitch is sounding? Fortunately, the other basic element, the timing of the note, is represented much more directly. But the very indirect nature of pitch representation makes NIFF sub-optimal for most performance and analysis applications.

To illustrate how MusicXML gives better results than MIDI for music interchange, let us look at a typical difference in real-life interchange. We scanned in the fourth song of Robert Schumann’s Liederkreis, Op. 24, on poems by Heinrich Heine, using SharpEye Music Reader version 2.16. We corrected the scanning mistakes within SharpEye: music scanning is not yet as accurate as optical character recognition, especially when scanning older public domain music. We then saved the files from SharpEye to MIDI and MusicXML. We imported the MIDI files into Finale 2002 and Sibelius 1.4 using the default MIDI import settings, and imported the MusicXML file into Finale 2002 using the Recordare MusicXML Finale Converter Beta 1.

This song is not very complicated, so all of its musical features can be captured within SharpEye and saved to MusicXML. Figure 2 shows the last four measures of the song as scanned into SharpEye:

Figure 2: Excerpt from Schumann’s Op. 24, No. 4 as scanned into SharpEye

Figure 3 shows what the last four measures of the song look like when imported into Finale using MusicXML:

Excerpt from Schumann's Op. 24, No. 4 imported into Finale using MusicXML

Figure 3: Importing SharpEye into Finale via MusicXML

Figure 4 shows the last four measures when imported into Finale using MIDI:

Excerpt from Schumann's Op. 24, No. 4 imported into Finale using MIDI

Figure 4: Importing SharpEye into Finale via MIDI

The song lyrics are in the MIDI file, though Finale’s reader did not import them. Figure 5 shows the last four measures when imported into Sibelius using MIDI. Sibelius reads the lyrics, but uses treble instead of bass clef for the left hand of the piano part.

Excerpt from Schumann's Op. 24, No. 4 imported into Sibelius using MIDI

Figure 5: Importing SharpEye into Sibelius via MIDI

As you can see, the MIDI imports are much less accurate than the MusicXML import, even with a simple music example. Why is this the case? Let’s compare what is represented in the MIDI file vs. the MusicXML file, using an XML version of the binary MIDI format. Let us look at the second measure of the left hand of the piano part. In the MusicXML file, we set the divisions to 6 to handle some triplets earlier in the song, so our four eighth notes look like:

                <note>
                    <pitch>
                      <step>B</step>
                      <octave>2</octave>
                    </pitch>
                    <duration>3</duration>
                    <voice>3</voice>
                    <type>eighth</type>
                    <stem>up</stem>
                    <staff>2</staff>
                    <notations>
                      <articulations>
                        <staccato/>
                      </articulations>
                    </notations>
                  </note>
                  <note>
                    <rest/>
                    <duration>3</duration>
                    <voice>3</voice>
                    <type>eighth</type>
                    <staff>2</staff>
                  </note>
                  <note>
                   <pitch>
                      <step>B</step>
                      <octave>2</octave>
                    </pitch>
                    <duration>3</duration>
                    <voice>3</voice>
                    <type>eighth</type>
                    <stem>up</stem>
                    <staff>2</staff>
                    <notations>
                      <articulations>
                        <staccato/>
                      </articulations>
                    </notations>
                  </note>
                  <note>
                    <rest/>
                    <duration>3</duration>
                    <voice>3</voice>
                    <type>eighth</type>
                    <staff>2</staff>
                  </note>

clear

In the MIDI file, represented using MIDI XML, the measure looks like this:

                <NoteOn>
                    <Delta>0</Delta>
                    <Channel>2</Channel>
                    <Note>47</Note>
                    <Velocity>64</Velocity>
                  </NoteOn>
                  <NoteOff>
                    <Delta>48</Delta>
                    <Channel>2</Channel>
                    <Note>47</Note>
                    <Velocity>64</Velocity>
                  </NoteOff>
                  <NoteOn>
                    <Delta>48</Delta>
                    <Channel>2</Channel>
                    <Note>47</Note>
                    <Velocity>64</Velocity>
                  </NoteOn>
                  <NoteOff>
                    <Delta>48</Delta>
                    <Channel>2</Channel>
                    <Note>47</Note>
                    <Velocity>64</Velocity>
                  </NoteOff>

clear

Consider the differences between the two formats. MIDI has no discrete note element; rather, notes are bounded by NoteOn and NoteOff events. Rests are not represented at all; they are inferred from the absence of notes. This actually works very well for MIDI’s intended use with synthesizers and electronic musical instruments. It is not very well suited for music notation. Given how much guessing the notation programs have to do to interpret a Standard MIDI File, you can understand why the results fall short, and fall short in a different way for each MIDI importing program.

MIDI also has no way to distinguish between a D-sharp and an E-flat; the one above middle C has a Note value of 63 in either case. Here Sibelius guessed correctly, while Finale guessed wrong. MIDI has no representation of beams or stem direction, and both programs got the beaming wrong in the voice part. The beaming follows the slur – which is also not represented in MIDI. Clefs are also missing from MIDI, so Sibelius guessed wrong on one part where Finale guessed correctly.

MIDI is the only music interchange format in common use for music notation today. When you can see all the ambiguities and missing data it produces, in this simple 4-bar example of a simple song, you can see why sheet music desperately needs a comprehensive, Internet-friendly interchange format. MusicXML has a tremendous advantage compared to prior efforts like NIFF and SMDL: XML had not been invented yet when the earlier teams did their work.

Prev Next