FAQ – Skiptune

Q. Who is the audience for this website?
A. The audience for the Skiptune website is anyone with an insatiable curiosity about music, especially melodies. Those with formal training in both music, and those with a facility with numbers, will have the easiest time understanding this website, but that’s a very small audience, so we strive to either avoid musical and statistical jargon or to explain it as much as possible. We are aiming to be accessible to someone who can sight-read sheet music and play a musical instrument.

Q. What are your criteria for whether a tune qualifies for the database?
A. The criteria qualifying a tune for the database are:

The tune must have stood the test of time, defined by having survived at least two generations. According to ancestry.com the oft-cited length of a generation was 20-25 years for most of human history. Recent evidence suggests that a generation today could be in the 27-35 year range, but to be conservative we will stay with the upper end of the commonly used range, or 25 years. So in 2025 all music that was written in 1975 and earlier, for which there’s some evidence that it was being played 50 years later, qualifies. The idea is that if a grandchild is still playing music written in the grandparent’s generation, that tune has stood the test of time.
The tune is expressed (written) in standard Western-style musical notation. While in some cases we will make a special effort to transform, say, a song in tablature format to standard notation, time considerations make it necessary to limit ourselves to music written in standard form.

Q. How do you decide which melodies or tunes to put into your database?
A. The goal is to get as many of the world’s melodies and tunes into the database as possible, a never-ending task, so our priority is to stay diverse. We choose tunes that belong to a variety of genres (e.g., country-western, rock, baroque), from different time periods, and from different cultures or geographic areas.

Q. Do you really enter tunes from any culture worldwide?
A. No. The tunes have to be expressed in standard Western notation. We don’t enter songs written, for instance, in Gregorian chant, Bulgarian folk notation, in Chinese musical notation systems, or any of the ancient systems of music notation. Even “Guido the Monk’s” notation is excluded (he invented the “note”). That means we can never be completely representative, but long-lasting tunes tend to make their way to standard Western-style notation.

Q. Why do you ignore harmony, bass, tempo, timbre, and lyrics? Those elements are what make music, not just melody.

A. Agreed, but melodies are fascinating in and of themselves and deserve study and research as a stand-alone body of work.

Q. How do you handle polyphonic music (music that has more than one melodic line)?
A. Our answer is subjective in that each tune is a judgment call. It can be difficult to differentiate between a second melodic line and a harmonic line. We tend to err on being overly inclusive for duets, and under-inclusive for three or more melodic lines, entering only the top melody or, if no melody can be discerned, not at all. We don’t enter ancient polyphonic music unless somewhat transcribes it into modern western notation. We do have a few polyphonic pieces entered for comparison purposes.

Q. How do you deal with rests?
A. The simple answer is that rests are pitches with a value of MIDI value of zero to indicate silence. But simply defining rests as a zero creates difficulty in our metrics because it creates problems with another principle we have established: We want to be able to ignore the key (or mode) a tune is written in, and we do so by defining our metrics so that each note is relative to the note that follows it. However, that means that the F pitch below the bass clef staff, which has a MIDI value of 41, would be represented as a pitch change in our database as a “41” if it is followed by a rest. But 41 is less than our assigned value of going to a rest, which is 42. That creates a discrepancy in our metrics because the other rests are coded as 42 in order to be consistent with our principle that the pitch changes are relative. As of 2019 there are only a dozen or so instances (out of millions) of such exceptions. If that number increases substantially, we will find a better way to deal with the issue.

Q. How do you deal with long rests, say several measures long, that you find in some musical compositions?
A. This is another unavoidable judgment call. Often the melody will be picked up by another voice, so we simply enter that voice’s melodic line. If there is silence, we keep the rests as rests or, if it makes sense, we break the melody into two melodies using the long rests as a breaking point (this is the most common approach to long rests). Long rests do present problems that are difficult to address consistently, and that in turn may create noise in our analysis. As long as the “law of large numbers” is enough to overcome the noise, the analytics still have validity.

Q. How do you address the fact that musical notation has evolved over the centuries and it’s not always clear what the precise note pitches and durations are?

A. Yes, musical notation has changed greatly, even after the renaissance, and the changes happen gradually so it is not always clear what the composer meant. Coupled with the fact that composers make mistakes in their notation, we are faced with quite a challenge. The answer is that we do the best we can to interpret scores in their historical context and record the melodies in the database as they probably would have been written if they were written in modern musical notation. In cases where we simply have no idea what was meant, we do not enter the piece into the database. Individual scholars all have their own ideas on any given piece, so any single interpretation is bound to be considered “wrong” by someone. This is a problem primarily found in renaissance and baroque music. Later music tends to be written with far less ambiguity as to the composer’s intent.

In general, however, we stick with the notes as written when we can, even though much was left to the performer to stylize the piece. This has been true throughout musical history, but it is only the written notes that have survived historically, so that’s all we have to go on.

Q. How do you decide whether a tune fits a specific genre?
A. To a large extent, it’s a judgment call. We rely a lot on how other people classify a specific tune when that’s available. When we have to rely on our own judgment, we consider all relevant factors and decide. In the end, the decision to put a tune into a genre is testable by examining the qualities of that melody against tunes in the genre for which there’s no question they belong. We are developing analytic tools to test one tune against another that relies on objective data, and we can use that in turn to test how good our genre classification has been and reassign genres accordingly.

Q. How do you account for the fact that a tune can be played in many different styles? For instance, a good musician can play a tune in a rock style, a swing style, and a country-western style.
A. We ignore performance styles and enter melodies as they are written on the page in sheet music. However, tunes that are flexible and popular enough to be written in many different styles are all tagged with those styles because it’s important information about the tune itself. Part of our fascination is that tunes evolve over the centuries because someone decides the written version doesn’t match what his or her “ear” says is the correct version.

Q. Some tunes reach the end and then repeat by starting over. How do you handle repeats?
A. Handling ‘repeats’ is an area of necessary compromise because putting in every repeat with every possible ending variation would make the database too large to manipulate in a timely manner. We make one of the two following compromises:

1 — We separate a melody into multiple melodies using first, second, etc., endings as breaking points. While doing so is a compromise, it at least ensures that the second ending is captured in the database. However, if a tune starts with pick-up notes and merely repeats, we usually just record the tune with the final ending. That means the tune ends with some rests rather than the pick-up notes that we would normally use if there were a second part to the song.
2 — We miss capturing the transition between the original tune and whatever is added to it. An example of this is Bunting’s 1840 version of The Robber (aka Charlie Reilly), which is only eight measures long. It’s used later as the first half of O’Neill’s version of Charles O’Reilly in 1903. Consequently, The Robber is found to have a unique pattern consisting of the last eight notes, including the final rest note. Charles O’Reilly’s first half is identified as a unique tune because it’s unique notes include the repeat at the end, which is not technically correct because it’s an exact repeat of The Robber. See our explanation of this example in more detail in How Many Notes Are Needed to Identify a Tune?

Q. In ancient music composers often didn’t add accidentals (sharps and flats) to notes, expecting musicians to simply know when to use them. How do you decide whether to use modern suggestions for accidentals?
A. Yes, from the invention of the note in the year 1030 by Guido the Monk all the way through the renaissance and a little beyond, composers expected performers to make certain pitch adjustments according to a system we call musica ficta. Musicologists disagree among themselves on precisely how music written during the renaissance and earlier sounded, and many disagree on specific instances, so we felt it best to enter the music as written, recognizing that in some cases the composer may have meant for an accidental to be used. If we encounter two written versions of the same tune, one using musica ficta and one not, we enter both.

Q. How do you deal with the practice of carrying the effect of an accidental throughout an entire measure? That practice wasn’t universal until around 1700 (after the Baroque era).
A. While it is true that some composers stuck to the old rule of notating each affected note with an accidental, by the 1600s the new convention was gaining in popularity. The new convention was to apply an accidental throughout the measure. Because this new rule rapidly became the norm starting with the Baroque era, we use it from 1600 on when entering tunes into the database unless we know from the historical record that a particular composer followed the old convention.

Q. How do you define a pattern of notes?
A. A pattern is just a specific series of notes. Each note has a pitch and a duration. For instance, “Baa, Baa Black Sheep” begins with two quarter notes at the same pitch. That’s one pattern. A quarter note followed by a half note of the same pitch would be another pattern.

Q. What do your pairs of numbers mean?
A. Each pattern is represented by two numbers. The first number is the result of subtracting the MIDI values of the first note from that of the second note, yielding an interval. The second number is the result of dividing the second note’s duration by the first note’s duration, yielding a duration ratio. For instance, the first pattern in “Baa, Baa Black Sheep” is two consecutive quarter notes of the same pitch, corresponding to the lyric, “Baa, baa”. That pattern is represented in the database as [0, 1] because there’s no difference in pitch and the notes are of equal duration.

Q. How do you represent note pitches?
A. We use the open source MIDI system.

Q. How do you code note durations?
A. See the definition of “duration” on the definitions page.

Q. Do you include tunes with harmonics in them?
A. Yes, but we code the harmonic notes as a fundamental MIDI value. The use of harmonics is confined to a few instruments, is relatively rare, and coding them as their true MIDI value is left for some future time.

Q. Why are some of your key signatures wrong?
A. The key signatures are only used to indicate the number of sharps or flats in a tune. For the purposes of this effort we don’t need to know whether a tune is in a Dorian, Mixolydian, or other mode. By representing all melodies as intervals and duration ratios we avoid the need to worry about keys or modes.

Q. How do you deal with little changes that occur in folk music over generations?
A. Yes, melodies evolve, get borrowed (or stolen), errors are introduced and remain uncorrected for decades, and so on. We include a tune even if it has one or two minor changes from a tune already in the database, especially if those changes relate to, say, adding a verse that requires an internal pick-up note not in the previous tune, or a slightly different ending. We usually just code these altered tunes as a variation on the first tune, but sometimes we note them as a derivative of the first tune. One of the more interesting results from the analysis is the frequency of common patterns that fall out from cross pollination and evolution of tunes. After all, there are limits to how much a tune can evolve before it becomes another tune. While such an evolution is a continuum, part job our effort is to invent metrics to track that evolution.

Q. How do you deal with melodies with caesura (two slash marks) or fermata in them?
A. We code caesura as a rest whose duration is consistent with the time signature. We code fermata as the given note’s duration. Caesura and fermata are both used for effect and thus are styles.

Q. How do you deal with cadenzas?

A. If the cadenza is long enough, we code it as a separate melody because it really is a stand-alone bit of music that cannot be grafted onto what precedes or follows it if we are to observe the standard rules of a measure’s rhythmical length.

Q. What surprised you the most during your research?
A. We’re surprised by how many tunes can be identified by a single pattern of two consecutive notes. We expected as we began that we would need at least 4 or 5 notes to make a tune unique. For instance, “The House of the Rising Sun” contains a two-note pattern found nowhere else in any genre or era. Even after tens of thousands of tunes have been entered from earliest times on to 50 years ago, across dozens of countries and cultures, and across dozens of musical styles, roughly two in 100 tunes can be uniquely identified by a single pattern of only two notes. That’s amazing.

Q. Why don’t you list all the names of the tunes you have in the database?
A. There are simply too many. With tens of thousands of tunes, each on average with three different names, that would make for some unwieldy pages. Perhaps someday we will put up a link to the database, but for now we’re focused on entering more tunes and exploring the richness of the database.