Music Notation Handwriting Recognition
Quil:
An experimental system for online music notation handwriting recognition
These slides can be found at slides.iQuil.com
Who?
A little about me:I like beautiful things. I like making things. I like solving problems with my intuition.
I've been called engineer, I've been mistaken for a mathematician, and there's a piece of paper I'll soon be awarded that says something about mastering science.
Who?
A little about me:But, really I'm a musician; I'm a composer. Technology is only important when solving human problems.
A musician friend of mine recently referred to me as "Mr. Luddite" and I took it as a compliment.
What?
The system presented here is an attempt to develop a transparent user interface not unlike pencil and paper.
It is best suited for gestural interfaces using either a stylus or a finger for input, but can be used with a mouse or trackpad.
The user can enter music notation with their existing knowledge of conventional Western music notation (CWMN) and directly manipulate the recognized symbols.
Really the idea here is
Why?
A music notation user interface should allow the user to sketch music notation quickly enough before they forget it.
The tools used shouldn't get in the way and shouldn't force you to think in a certain way.
And, most importantly, the tools shouldn't force any decisions to be made prematurely for the tools' sake.
Current state of the art for music notation handwriting recognition
Several research systems for gestural or online input have been developed, notably by Susan E. George at the University of South Australia, and at work at IBM's T.J. Watson Research Center.
Purely gestural interfaces should be differentiated by their requirement of learning a specialized set of gestures that are not necessarily related to CWMN.
There has been much more work done in offline optical music recognition (OMR) for both handwritten and typeset CWMN, with significant contributions from Ichiro Fujinaga at McGill University. Offline recognition is fundamentally different in that it deals with pre-existing printed documents.
There are several commercially available offline OMR programs that work on scanned images of typeset scores.
Current state of the art for commonly-used music notation software
Presently, a user typically enters music notation with a combination of a mouse, alphanumeric keyboard, and/or a piano style keyboard.
Current systems often use a palette of symbols that can be selected and placed on a staff.
If I begin sketching, how do I get to this mess…
Finished product
&hellip to this, a finished product to be published and read by others.
score typeset with Nightingale music notation software
Demonstration
Keep in mind this is experimental software. It is an inchoate implementation, components of which are naively implemented and still include a hack or two as demonstrated.
In other words, things might break.
Demonstration
Please try it on your own device or laptop by pointing your web browser at:
I'd be flattered if you ignore the rest of talk while trying things out (that's why I've saved the technical stuff for the back half).
iQuil.com
At the conclusion, our violist Sam will perform a few examples volunteered from by audience.
"There's a fine line between a clever heuristic and a hack." --Chris Raphael
"It's such a fine line between stupid and clever." --David St. Hubbins
how does it work?
- segmentation: which combination of ink segments form a symbol.
- classification: what is the most likely symbol for a collection of ink.
- language model: how the discrete symbols fit together in a meaningful way.
1. segmentation
The user-drawn ink is segmented to determine which combination of strokes are most likely to combine to form a symbol.
| if these five strokes are drawn | the first stroke forms a treble clef |
![]() |
![]() |
| and the remaining four strokes | combine to form a sharp |
|
![]() |
2. classification (templates)
Classification always works on user-drawn templates of ink.
Drawn symbols are often very different from typeset symbols.
Here is a demonstration of how the training works.
These are all the symbols currently recognized:


This is only a small subset of all the symbols that constitute CWMN, and only includes discrete symbols.
2. classification (matching)
After both template strokes and input strokes are normalized by scale and to an equidistant fixed number of points template matching is performed by calculating the mean of the distances from each drawn point to the nearest hand-trained template point and vice-versa. In other words, this comparison is bi-drectional between drawn ink and the template. This could also be described as nearest neighbor Euclidean distance.
| close, match | far, no match |
|
|
3. language model
A hand-coded collection of bigrams define how likely one symbol is to follow another.
Typesetting
The music notation is typeset on the HTML5 Canvas using using glyphs constructed from the embedded Feta (from Lilypond).
User Correction of symbols
Symbols can be corrected by the user. This feedback informs the classifier.
"Humans are always going to win." --Douglas Eck
Implementation
The present implementation works on any late-model web browser with HTML5 support without installing any software or plugins.
All images in this presentation, except where noted, are generated by or are screen captures of the software program.
All code is either markup or interpreted JavaScript; the web address constitutes a reference to a full source code listing.
A testing framework can record and replay serialized ink, as well as take bitmap captures for iterative test comparisons and identification of regressions.
Future work
Improved classifier (either an Artificial Neural Network or a Support Vector Machine)
Train the system on thousands of handwritten examples.
Continuous symbols: beams, slurs, ties.
Combining handwriting with realtime audio input for pitch (singing, for example).
Notation improvements: Including chords, multiple voices, multiple staves.
"Conventional Western music notation does not have well-defined borders; it fades away indefinitely in all directions." --Don Byrd
"There's always one more thing." --Chris Raphael
Acknowledgements
Thanks to Professors Christopher Raphael and Larry Yaeger for their guidance on this project.
Thanks to my wife for reminding me on a daily basis of her intense love of music and her intense hatred of new technology.
Thanks to Donald Byrd, who has taught me a tremendous amount about music notation and music notation software.
Thanks to Samuel Daunt for playing the viola for us today.
Thanks to everyone who has given the time to provide insight and feedback on this project, especially the HCI/design students and faculty.
Thanks to my friends and family.
Thanks to my undergraduate capstone students.
Thanks to the folks at the Revolution Bike and Bean for feeding my caffeine addiction. Thanks to Kate Grigg there for putting me in touch with Sam this afternoon.
Thanks to anyone else I forgot to mention here (you know who you are).
Questions, Show and Tell
Any volunteers with examples for Sam to perform? Any volunteers want to give the whiteboard a try?
As demonstrated: iQuil.com
Bleeding edge (better in theory, but probably more broken): bleed.iQuil.com
Brief demonstration video: demo.iQuil.com
These slides: slides.iQuil.com
Please continue the dialog: chirgwin at indiana dot edu
"The belief in a certain idea gives to the researcher the support for his work. Without this, he would be lost in a sea of doubts and insufficiently verified proofs." --Konrad Zuse


