Finalizing the Database

by

in
We’ve been putting off a needed correction to the database that we can no longer delay because training any AI model without this correction would be impossible.  When we were deciding how to encode melodies written with standard Western music notation, encoding the notes as their numerical MIDI values was an easy (and correct) decision.  Thus, middle “C” would be encoded as “60”.  But what to do about rests?
Rests Are a Problem
At the time, it seemed reasonable to just let rests have the numerical value zero.  After all, a zero indicates nothing, which in music is no sound, corresponding perfectly with a rest where the musician literally plays nothing.  When we played an entered melody, we heard silence during the rest, as we should.  It seemed so simple and obvious that we didn’t think about what that implied.  There were several problems that arose over time.
For instance, coding rests as a zero raised what would eventually become a serious searching issue.  That’s because our searching algorithm does so on the tuples created by each pair of notes.  Here’s an example.  Let’s say this was the music we were trying to encode:
Duration Ratio 0.857 in Brahms's Hungarian Dance No. 12
Brahms’s Hungarian Dance No. 12

Pay attention only to the first measure or bar. Converting it to MIDI values, using 0 for rests, is as follows:

0 1 72 1 60 1 72 1 0 1 70 1 58 1 70 1

And converting that to tuples results in:

[72, 1] [-12, 1] [12, 1] [-72, 1] [70, 1] [-12, 1] [12, 1]

The tuples were created by subtracting the first pitch in each pair of consecutive notes from the second pitch, and dividing the first duration in each pair of consecutive notes into the second duration.  For example, the first tuple is found by 72 minus 0, and 1 divided by 1, or [72, 1].  You can check for yourself by doing the arithmetic as described for any pair of notes.

Focus on the fourth tuple, [-72, 1], which results from subtracting the “C” sixteenth note in the above sheet music from the rest, or zero minus 72, which is -72.  Likewise, the duration ratio is 1 divided by 1, or just 1.  That’s how the “C” sixteenth note, followed by a sixteenth rest, is encoded in tuple form.

Now look at the the tuple following that one, which represents the “B” sixteenth note after the sixteenth rest.  “B” has a MIDI value of 70, so the tuple is 70 minus 0 followed by the duration ratio of 1 divided by 1, or [70, 1].

So far so good, but how do we know the “-72” means going to a rest?  After all, it’s possible that two consecutive MIDI numbers might be apart from each other by 72 MIDI values.  Likewise, coming out of the rest to the “B” sixteenth note is represented by 70, which could also be a jump in notes.  

Our solution at the time, which now looks foolish, was to use -42 and +42 to indicate going into a rest and coming out of a rest, respectively.  Why 42?  It’s not because of The Hitchhiker’s Guide to the Galaxy by Douglas Adams, who posited “42” as the answer to the question of “Life, the Universe, and Everything,” though that cultural reference made it more attractive.  We chose 42 to represent rests because rarely do we see changes in pitch over 41 MIDI values either up or down.  That proved to be correct in that even with 83,000 melodies, only a couple dozen have jumps in pitch over 41 MIDI values away.  Even then we knew this was a kludge solution, though a workable one.

However, using 42 to represent a rest leads to the second problem that’ score serious.  Substituting 42 for the rest in the above musical example results in this tuple list:

[42, 1] [-12, 1] [12, 1] [-42, 1] [42, 1] [-12, 1] [12, 1]

Visually, this was helpful because when we looked at tuple lists, we knew that every time we saw a “42”, it was a rest. But this meant that our searching algorithm could be confused by rests and supply incorrect matches.  For instance, let’s say that another piece of music looked like this: 

This is identical to the first example except that the last three notes are “D” instead of “Bb” sixteenth notes.  “D” has a MIDI value of 74, so the tuple line becomes

[72, 1] [-12, 1] [12, 1] [-72, 1] [74, 1] [-12, 1] [12, 1]

The only difference is the “74” that now appears in the fifth tuple.  However, the searchable tuple line is still:

[42, 1] [-12, 1] [12, 1] [-42, 1] [42, 1] [-12, 1] [12, 1]

This is identical to the one above for the first example.  That’s because any pitch difference over 41 is assigned the rest value of 42.  You can see the problem.  Both of those two musical examples appear as a match to the searching algorithm if we search for any sixteenth note followed by a sixteenth rest followed by any sixteenth note.  

In reality, it is not useful to search for such a short musical phrase as short as three notes (in this example, a C sixteenth followed by a sixteenth rest followed by a D sixteenth) because even if the encoding was exact, there would be too many matches to be useful.  Luckily, once you start adding notes before and after the notes around the rest, the search results narrow quickly.  Even with 83,000 melodies in the database, a search for the first bar of either of the above examples yields only two “hits”, both of which are Hungarian dances.  

Nonetheless, we can’t allow any inaccuracies.  Also, this encoding means a couple dozen melodies give false results when queried about their statistics because some melodies do indeed have pitch differences of more than 41 MIDI values.  

In the next blog we will discuss the fixes for this “rest” problem of our own making.