The purpose of the project is to make an application that can play along with a live musician. Think of those old country-western albums: this is the digital equivalent of the guy who sits in the back with an electric guitar and interjects notes whenever the spirit moves him. The scope of this project may change, but this is what I am starting with.
I have broken this project into parts:
Gathering Information From The User
Pitch Detection
The machine has to be able to hear what the user is playing. For this, I chose to use the Max/MSP language for all interactions with the sound card. Max/MSP supports both midi data and “signals” as primitives, and makes filtering those signals easy. For pitch detection, I wrote a bank of bandpass filters. This will be the topic of the next post!
Key Detection
Once the machine has a collection of pitches, it must make sense of them. I will most likely use a Hidden Markov Model or decision trees to determine the key of a musical piece, depending on which gets the job done with the least amount of overhead.
Max/MSP comes with a Java API, and there is more than one way to use Prolog from within Java. This gives me a lot of options! Honestly, I am aching for an excuse to use Prolog in something bigger than a toy problem, and this seems like the perfect opportunity. I will probably go with InterProlog from Java. This most likely means I will be going with decision trees for key detection
Beat Detection
Davies, Brosier, and Plumbly wrote an amazing paper on non-symbolic beat tracking. There are a few papers on this topic, but this is the one I will be basing my beat tracking system on. The details on this will, of course, be in a different post.
Using That Information
After the machine has information about the rhythmic and tonal structure of the piece, it can use this all to make some notes of its own. My aim right now is to have it generate short phrases to push the music along, but really there are any number of directions this could go in.
Markov Chains
I recently discovered that Max/MSP has first-order markov chains built right into the language. They just stuck them right in there. This may be a solution, but it is not my favorite.
Genetic Algorithms
After toying around with genetic algorithms this summer, I made an observation: they yield one “optimal” solution, but hundreds of other solutions that are almost as good. I can use this to my advantage here because
- “Tonal harmony” is a huge space to search, and
- “optimal” is subjective in music.
When I say that “tonal harmony” is huge, I mean that there are twelve notes in about four registers with about fourteen different meter values they can have. Can you see why, even for one measure, it is not realistic to search through all possible combinations?
When I say optimal is subjective, I mean this: In general, a major third is going to sound better than a minor second. A major third is consonant, a minor second is dissonant. However, a minor second can introduce tension and motion in ways that a major third cannot. At any point in a song, is it better to play a major third or a minor second? It depends on the context. What about a major third over a minor third? It depends on the mood. A perfect fifth over a diminished fifth? It depends on whether you are playing east Asian folk music or river delta blues.
I guess it would be possible to determine mood statistically (or by some other means), but I am not going to try right off. I am going to leave that part out to begin with and see how things turn out.
What this means is, these “suboptimal” solutions returned from the genetic algorithm can still be used. The plan was to use short phrases, just long enough to drive the music, but not so long as to dominate the melody. If it so happens that some phrases reflect different moods than the piece had originally, all the better! The machine has helped drive the song in a different direction. That’s not so bad.
Spitting Out Notes
Now that the machine knows what notes to play and when to play them, it has to actually make it happen.
Midi
This project will most likely output its notes in midi form. The focus is not on realistic timbres of instruments, but rather on the realistic use (timing and phrasing) of them. It’s more important to get the notes in the right place! The textures can come later. Also, I can save some processing time by leaving the synthesis for something external.
The Guru
The user should have a say in what the machine plays! Sometimes it is just not appropriate to have it play notes in the background, or use only major scales, or. . .
The design for this application (right now) includes controls for halting output, tweaking the rules in the genetic algorithm, forcing the machine to use a certain key, and other things. I am trying to keep user interaction a priority, so this part is important. This machine should be treated as just another instrument. It just happens to be a smart one.
Next time on HBFS: Pitch Detection