Music Notation Handwriting Recognition

Quil:

An experimental system for online music notation handwriting recognition

These slides can be found at slides.iQuil.com

Set up whiteboard connected to MacBook, load slides in Firefox on main displays, prepare iPad and iPhone on downshooter.

Who?

A little about me:

I like beautiful things. I like making things. I like solving problems with my intuition.

I've been called engineer, I've been mistaken for a mathematician, and there's a piece of paper I'll soon be awarded that says something about mastering science.

Who?

A little about me:

But, really I'm a musician; I'm a composer. Technology is only important when solving human problems.

A musician friend of mine recently referred to me as "Mr. Luddite" and I took it as a compliment.

What?

The system presented here is an attempt to develop a transparent user interface not unlike pencil and paper.

It is best suited for gestural interfaces using either a stylus or a finger for input, but can be used with a mouse or trackpad.

The user can enter music notation with their existing knowledge of conventional Western music notation (CWMN) and directly manipulate the recognized symbols.

Brief history of music notation and its flaws, but it is a system we know
Really the idea here is

Why?

A music notation user interface should allow the user to sketch music notation quickly enough before they forget it.

The tools used shouldn't get in the way and shouldn't force you to think in a certain way.

And, most importantly, the tools shouldn't force any decisions to be made prematurely for the tools' sake.

Current state of the art for music notation handwriting recognition

Several research systems for gestural or online input have been developed, notably by Susan E. George at the University of South Australia, and at work at IBM's T.J. Watson Research Center.

Purely gestural interfaces should be differentiated by their requirement of learning a specialized set of gestures that are not necessarily related to CWMN.

There has been much more work done in offline optical music recognition (OMR) for both handwritten and typeset CWMN, with significant contributions from Ichiro Fujinaga at McGill University. Offline recognition is fundamentally different in that it deals with pre-existing printed documents.

There are several commercially available offline OMR programs that work on scanned images of typeset scores.

Current state of the art for commonly-used music notation software

Presently, a user typically enters music notation with a combination of a mouse, alphanumeric keyboard, and/or a piano style keyboard.
Current systems often use a palette of symbols that can be selected and placed on a staff.
If I begin sketching, how do I get to this mess…

pencil and paper sketch of the march I wrote for my bride last summer

Finished product

&hellip to this, a finished product to be published and read by others.

score typeset with Nightingale music notation software

There are people in the world who have good hands and can draw music, and then there are the rest of us. There is a big difference between what we quickly sketch for ourselves and what we write to be read by others.

Demonstration

Keep in mind this is experimental software. It is an inchoate implementation, components of which are naively implemented and still include a hack or two as demonstrated.

In other words, things might break.

There are people in the world who have good hands and can draw music, and then there are the rest of us. There is a big difference.
Ever seen an 'Aw, Snap' error in Google Chrome? Should work in latest beta of Internet Explorer

Demonstration

Please try it on your own device or laptop by pointing your web browser at:

I'd be flattered if you ignore the rest of talk while trying things out (that's why I've saved the technical stuff for the back half).

iQuil.com

At the conclusion, our violist Sam will perform a few examples volunteered from by audience.



"There's a fine line between a clever heuristic and a hack." --Chris Raphael

"It's such a fine line between stupid and clever." --David St. Hubbins

how does it work?

  1. segmentation: which combination of ink segments form a symbol.
  2. classification: what is the most likely symbol for a collection of ink.
  3. language model: how the discrete symbols fit together in a meaningful way.

ink is defined a pen down to pen up

1. segmentation

The user-drawn ink is segmented to determine which combination of strokes are most likely to combine to form a symbol.

if these five strokes are drawn the first stroke forms a treble clef
and the remaining four strokes combine to form a sharp
A stroke is defined as a vector of points from pen down to pen down to pen up. Demonstrate drawing a treble clef followed by a sharp. How does the system know these 5 strokes belong to 2 symbols?

2. classification (templates)

Classification always works on user-drawn templates of ink.
Drawn symbols are often very different from typeset symbols.

Here is a demonstration of how the training works.

These are all the symbols currently recognized:


This is only a small subset of all the symbols that constitute CWMN, and only includes discrete symbols.

Notices the viola clef is saliently absent. One idea is to incorporate this into a game with a purpose. Show example: attempt to fill in treble clef as typeset.

2. classification (matching)

After both template strokes and input strokes are normalized by scale and to an equidistant fixed number of points template matching is performed by calculating the mean of the distances from each drawn point to the nearest hand-trained template point and vice-versa. In other words, this comparison is bi-drectional between drawn ink and the template. This could also be described as nearest neighbor Euclidean distance.

close, match far, no match
black is trained template ink, color is user drawn ink

3. language model

A hand-coded collection of bigrams define how likely one symbol is to follow another.

Typesetting

The music notation is typeset on the HTML5 Canvas using using glyphs constructed from the embedded Feta (from Lilypond).

Don Byrd could point out a half dozen or so problems with this example.

User Correction of symbols

Symbols can be corrected by the user. This feedback informs the classifier.

"Humans are always going to win." --Douglas Eck

Implementation

The present implementation works on any late-model web browser with HTML5 support without installing any software or plugins.

All images in this presentation, except where noted, are generated by or are screen captures of the software program.

All code is either markup or interpreted JavaScript; the web address constitutes a reference to a full source code listing.

A testing framework can record and replay serialized ink, as well as take bitmap captures for iterative test comparisons and identification of regressions.

Future work

Improved classifier (either an Artificial Neural Network or a Support Vector Machine)

Train the system on thousands of handwritten examples.

Continuous symbols: beams, slurs, ties.

Combining handwriting with realtime audio input for pitch (singing, for example).

Notation improvements: Including chords, multiple voices, multiple staves.


"Conventional Western music notation does not have well-defined borders; it fades away indefinitely in all directions." --Don Byrd

"There's always one more thing." --Chris Raphael


Acknowledgements

Thanks to Professors Christopher Raphael and Larry Yaeger for their guidance on this project.

Thanks to my wife for reminding me on a daily basis of her intense love of music and her intense hatred of new technology.

Thanks to Donald Byrd, who has taught me a tremendous amount about music notation and music notation software.

Thanks to Samuel Daunt for playing the viola for us today.

Thanks to everyone who has given the time to provide insight and feedback on this project, especially the HCI/design students and faculty.

Thanks to my friends and family.

Thanks to my undergraduate capstone students.

Thanks to the folks at the Revolution Bike and Bean for feeding my caffeine addiction. Thanks to Kate Grigg there for putting me in touch with Sam this afternoon.

Thanks to anyone else I forgot to mention here (you know who you are).

Questions, Show and Tell

Any volunteers with examples for Sam to perform? Any volunteers want to give the whiteboard a try?

As demonstrated: iQuil.com

Bleeding edge (better in theory, but probably more broken): bleed.iQuil.com

Brief demonstration video: demo.iQuil.com

These slides: slides.iQuil.com

Please continue the dialog: chirgwin at indiana dot edu

"The belief in a certain idea gives to the researcher the support for his work. Without this, he would be lost in a sea of doubts and insufficiently verified proofs." --Konrad Zuse