MusicXML Design Issues

MusicXML follows MuseData and other formats in separating underlying musical representation from the specifics of a particular engraving or music performance. As with MuseData, the three domains are combined within a single format. The “logical domain” of music is found in MusicXML’s elements, while details of the visual and performance domain are found in MusicXML’s attributes. There are also dedicated elements for <print> and <sound> where attributes associated with logical domain elements were not sufficient.

The integration of the three domains into a single format speaks to the need to cover an adequate range of music applications in a single notation format. The distinction between elements and attributes facilitates the segmentation of domains both for learning MusicXML and building applications. Distinctions between domains tend to be cleaner in theory than in practice. Given MusicXML’s commercial focus, it made sense not to be overly rigorous about these theoretical distinctions. To introduce how MusicXML represents musical scores, here is the musical equivalent of C’s “hello, world” program for MusicXML. Here we will create about the simplest music file we can make: one instrument, one measure, and one note, a whole note on middle C:

Figure 2 - Whole note middle C in 4/4 time

Figure 2: A Musical “Hello, World”

Here is the musical score represented in MusicXML:

                <?xml version="1.0" standalone="no"?>
                <!DOCTYPE score-partwise PUBLIC 
                  "-//Recordare//DTD MusicXML 0.6b Partwise//EN"
                  "http://www.musicxml.org/dtds/partwise.dtd">
                <score-partwise>
                  <part-list>
                    <score-part id="P1">
                      <part-name>Music</part-name>
                    </score-part>
                  </part-list>
                  <part id="P1">
                    <measure number="1">
                      <attributes>
                        <divisions>1</divisions>
                        <key>
                          <fifths>0</fifths>
                        </key>
                        <time>
                          <beats>4</beats>
                          <beat-type>4</beat-type>
                        </time>
                        <clef>
                          <sign>G</sign>
                          <line>2</line>
                        </clef>
                      </attributes>
                      <note>
                        <pitch>
                          <step>C</step>
                          <octave>4</octave>
                        </pitch>
                        <duration>4</duration>
                        <type>whole</type>
                      </note>
                    </measure>
                  </part>
                </score-partwise>

clear

For scores of this simplicity, MusicXML’s design roots are clearly apparent. This is basically an XML version of the MuseData representation. Several of MusicXML design elements, including the interchangeability between partwise and timewise formats, have been described previously [5]. Here we will focus on some additional design aspects that have proven to be important for music translation, and that look to be important for future work in musical analysis. One key design choice is that each aspect of music semantics is represented in a different element. This provides the greatest flexibility for diverse music applications, especially once music information retrieval is included in the application mix. Our example analysis programs below will demonstrate some of the benefits of this design choice. Another key design element carried over from MuseData is the importance of separately representing what is heard vs. what is notated [13]. Take the issue of note duration. MusicXML follows MIDI and MuseData by putting the denominator of music duration, the number of divisions per quarter note, in a separate, usually unchanging <divisions> element. The whole note is represented both in sound, as a <duration> of 4 divisions, and as a graphical <type> of a whole note. It is useful to have both, since notation programs work more easily with the graphical type, while sequencers work more easily with the duration values. In other cases, such as jazz, sounding duration is different than written duration, so both elements are required for an adequate representation of both musical sound and a musical score. This type of dual representation of sound and graphics, so crucial to support diverse industrial applications, contrasts with the graphical representations used in NIFF and the WEDELMUSIC XML format [1]. NIFF is a binary format, but if we translate its binary elements directly into an XML document, our middle C whole note would look something like this:

                <Notehead Code="note" Shape="2" StaffStep="-2">
                   <Duration Numerator="1" Denominator="1"/>
                </Notehead>

clear

The StaffStep attribute tells us that the note is two staff steps, or one line, below the staff. But what is its sounding pitch? To determine that, we need to check the clef and key signature, handle any accidentals that preceded the note in this measure, look for any accidentals in a note that may be tied to this one, and interpret any 8va markings. This is a lot of computation for one of the two most basic elements of music notation: what pitch is sounding? Fortunately, the other basic element, the timing of the note, is represented much more directly. The very indirect nature of pitch representation makes NIFF and other graphical formats unusable for most performance and analysis applications. It even makes for problems in its intended use in the visual transfer between scanning and notation applications. The NIFF importer included in Sibelius 2.11 has bugs that are directly correlated to missing one or more of the multitude of steps needed to accurately determine musical pitch from NIFF data. Graphical formats have a long history in music representation, and are appropriate as internal formats for many applications, but have severe problems when used as the foundation of a general music interchange format.

Prev Next