Overview Overscore is an automatic score reader, that recognizes musical sheets and plays them using [[http://overtone.github.com/][Overtone]]. Overscore consists of three main stages:
- an /Optical Music Recognition/ (OMR) system to recognize partitions and compile them into MusicXML files
- a MusicXML parser that compiles MusicXML files into Clojure files
- multiple helper macros and functions built over Overtone to have a system more adapted to encode classical music

This project is the result of work done at the ULB (Université Libre de Bruxelles), for the course PROJ-H-402.

Dependencies Aside from the dependencies listed in the =project.clj=, OpenCV is also needed for the classification step, and the classifier (which is written in C++) should be compiled:

#+BEGIN_SRC shell cd src/overscore/recognition/classification g++ opencv_knn.cpp -o opencv_knn pkg-config opencv --libs --cflags #+END_SRC

The Gamera python plugin is also needed for staffline detection.

Usage The application can be launched using [[http://leiningen.org/][leiningen]]:

#+BEGIN_SRC shell $ lein2 run help Possible arguments: convert Convert symbols from Audiveris training set to png images in the directory (from the directory)

preprocessing <in> <out> <out-ref>
    Preprocess the image <in>, saving the output image to
    <out> and the reference lengths descriptions in
    <out-ref>

staffline <in>
    Isolate the systems and find the staffline positions
    on each system, from the image <in>. Saves the output
    for each system to <in>-n.png and <in>-n.txt, where n
    is an integer

segmentation <in-img> <in-refs> <out-segs>
    Segment the image <in-img>, with reference lengths
    described in <in-refs>, isolating each symbol. Save
    the segments descriptions in <out-segs>

classification <training-set> <in-img> <in-segs> <out-classes>
    Classify the segments of <in-img> described by the
    file <in-segs> (generated by the segmentation step),
    using the training set at location <training-set>
    (created by the convert step). Save the segments along
    with their classes in <out-classes>

semantics <in-classes> <in-refs> <in-stafflines> <out-xml>

    Convert the recognized score to a MusicXML
    document (<out-xml>), from the classes (<in-classes>) output
    by the classification step, the reference lengths (<in-refs>)
    output by the preprocessing step, and the staff lines
    positions (<in-stafflines>, the .txt file) output by the
    staffline step.

generate <in> <out> <name>
    parse the <in> MusicXML file, and generate the song
    <name> in the clojure file <out>

play <file> <ns>/<name>
    play the song called <name> defined in the file <file>, which
    defines the namespace <ns>. If no namespace is defined, only
    <name> can be given

#+END_SRC

Some example data is provided in the [[example]] directory, which contains both input data and (almost) perfect output data. For example, after running the following command:

#+BEGIN_SRC shell $ lein2 run preprocessing example/input/furelise.png /tmp/furelise.png /tmp/refs.txt #+END_SRC

The image =/tmp/furelise.png= should (roughly) correspond to the image =example/preprocessed/furelise.png=, and the file =/tmp/refs.txt= to [[example/preprocessed/refs.txt]].

The other possible commands, on the example files, are:

#+BEGIN_SRC shell $ lein2 run staffline example/preprocessed/furelise.png $ lein2 run segmentation example/staffline/furelise-0.png
example/preprocessed/refs.txt /tmp/segs.txt $ lein2 run classification training-set/
example/staffline/furelise-0.png
example/segmentation/segs.txt
/tmp/classes.txt $ lein2 run semantics example/classification/classes.txt
example/preprocessed/refs.txt
example/staffline/furelise-0.txt
/tmp/furelise-0.xml $ lein2 run generate example/semantics/furelise-0.xml
/tmp/furelise-0.clj
furelise $ lein2 run play example/generate/furelise-0.clj
furelise/furelise #+END_SRC

Note: the =staffline= step stores its output in the same directory as its input.

How it works ** OMR The initial plan for the OMR part is documented in [[doc/design/design.pdf]]. In this section we will briefly describe how it is currently implemented and discuss some results. *** Part 1: Image Preprocessing **** Binarization Since the input image might be in grayscale or color, a binarization step is sometimes necessary. However, most musical sheets found on the internet are already binarized and this step is thus performed only if necessary.

It is done in two steps:

If the image is in color, convert it into grayscale using the [[http://en.wikipedia.org/wiki/Grayscale#Converting_color_to_grayscale][luma component]], implemented in [[src/overscore/preprocessing/gray.clj][preprocessing/gray.clj]].
Convert the grayscale image into binary using [[http://en.wikipedia.org/wiki/Otsu%27s_method][Otsu's method]], implemented in [[src/overscore/preprocessing/otsu.clj][preprocessing/otsu.clj]]. **** Reference lengths Once the image is binarized, the reference lengths (staff line and staff space height) are found by analyzing the most common vertical runs (black for staff line height, white for staff space height) as described in [[RebeloFujinaga2012][Rebelo, Fujinaga et. al., 2012]] **** Improvements Multiple steps could be added to the preprocessing part. The most useful would be a skew detection and correction step, since it is common to have skewed images among scanned documents. Another possibly useful step would be a noise reduction step, depending on the quality of the scanned documents. *** Part 2: Staff line processing **** System isolation The first step done by this part is to isolate the systems to process them independently. This is done by a technique similar to the one described in [[Fujinaga1988][Fujinaga, 1988]].

First, a y-projection of the pixel is done (see an example result in [[overscore/tree/master/doc/yproj.png][doc/yproj.png]]), and the systems are located by looking for five distinct peaks of black pixels. Then, the boundaries of the system are found by looking around the system for the lines where the number of black pixel is minimal.

Once each system is found, they are each saved in an image.

image from doc/yproj.png generated by

(def img (ImageIO/read (File. "/home/quentin/p/overscore/data/furelise.png")))

(def p (projection img :y))

(def chart

(set-background-alpha

(bar-chart (range (length p))

p :vertical false ) 0))

(.setVisible (.getRangeAxis (.getCategoryPlot chart)) false)

(.setVisible (.getDomainAxis (.getCategoryPlot chart)) false)

(save chart "foo.png" :width 2745 :height 3611)

then assembled with the png of the sheet

This step is implemented in [[src/overscore/staffline/identification.clj][staffline/identification.clj]]. **** Staff line identification and removal The staff line removal is done by using [[http://gamera.informatik.hsnr.de/index.html][Gamera]]'s [[http://music-staves.sf.net/][music-staves]] plugin. A python script is simply called with the input image, and outputs the same image without the staffline (in an image), as well as the staff line positions (in a text file).

If no staff line were found in an image, it can be discarded since it is most likely not a relevant image (eg. some text, like the title of the partition).

This step is implemented in [[src/overscore/staffline/removal.clj][staffline/removal.clj]] and calls the script [[src/overscore/staffline/removal.py][staffline/removal.py]]. **** Improvements If the staff line removal does not work as expected because the image is skewed, a skew correction algorithm should be implemented in the [[Part 1: Image Preprocessing][preprocessing]]. *** Part 3: Symbol Recognition **** Symbol Segmentation The segmentation is done in a similar way as done in [[OpenOMR][OpenOMR]]. All the sub parts of the segmentation process are assembled in [[src/recognition/segmentation/segmentation.clj][recognition/segmentation/segmentation.clj]], and takes as input a path to an image (output by the preprocessing step), a path to a text files containing the reference lengths (also part of the output of the preprocessing step). It outputs all the segments represented as a vector of 4-element vectors in a text file. ***** Level-0 Segmentation The first segmentation is done by finding consecutive columns which contains black pixels. The results are refined by:

Grouping close segments, which can happen for example in the case of a dotted note
Not taking small segments, which are probably due to noise in the scanned image.

The level-0 segmentation is implemented in [[src/overscore/recognition/segmentation/level0.clj][recognition/segmentation/level0.clj]]. An example of level-0 segmentation result can be found in [[overscore/tree/master/doc/level0-segments.png][doc/level0-segments.png]]. ***** Note Head Detection For each level-0 segment, we need to know if it contains a note head or not.

To detect if a segment contains note heads, the following algorithm is used (taken from [[OpenOMR][OpenOMR]]):

For each column, find (if present) the biggest black run that is:
- Smaller than 3/2 of the staffspace height
- Bigger than 2 times the staffline height Remember the columns where such runs are present.
Find segments of columns having those black runs, such that the lengths of the segment is at least half of the staffspace height. Those segments correspond to the note heads.

Segments having note heads in it are further decomposed into multiple level-1 segments. The others can directly be used as level-1 segments without further decomposition.

The note head detection is implemented in [[src/overscore/recognition/segmentation/notehead.clj][recognition/segmentation/notehead.clj]]. ***** Level-1 Segmentation Level-1 segmentation use the data computed by the note head detection: for each note head found, it creates a level-1 segment. The space between the note heads is also saved in a level-1 segment.

Level-1 segmentation is implemented in [[src/overscore/recognition/segmentation/level1.clj][recognition/segmentation/level1.clj]] and an example output on level-0 segments that contains notes can be found in [[overscore/tree/master/doc/level1-segments.png][doc/level1-segments.png]]. ***** Level-2 Segmentation The level-2 segmentation separates the symbol contained in each level-1 segment vertically. The resulting segments should then correspond to the musical features (eg. a note head, a sharp, ...) and can then be classified.

Level-2 segmentation is implemented in [[src/recognition/segmentation/level2.clj][recognition/segmentation/level2.clj]]. **** Symbol recognition Multiple symbol recognition methods are implemented. The one used by default uses the [[https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm][k nearest neighbors algorithm]] provided by [[http://opencv.org/][OpenCV]], using [[http://audiveris.kenai.com/][Audiveris]]' training set.

Since Audiveris store its training set as xml files describing vertical runs for each image, we need to convert it to "normal" (2-bit PNG) images (for easier manipulation). This is done in [[src/overscore/tools/audiveris.clj][tools/audiveris.clj]].

The training set is then loaded in [[src/recognition/classification/training.clj][recognition/classification/training.clj]], each image being resized to a 20x20 image and represented by a vector of 400 integer (1 meaning the pixel is on (ie. black), 0 meaning it is off).

OpenCV's k nearest neighbor method is called directly from a C++ program, [[src/recognition/classification/opencv_knn.cpp][recognition/classification/opencv_knn.cpp]], and the resulting program is called from clojure in [[src/recognition/classification/opencv_knn.clj][recognition/classification/opencv_knn.clj]]. Once OpenCV is installed, the C++ program can be compiled with:

#+BEGIN_SRC shell $ g++ opencv_knn.cpp -o opencv_knn pkg-config opencv --libs --cflags #+END_SRC

Another simple classifier using kNN (implemented by hand) is implemented in [[src/recognition/classification/knn.clj][recognition/classification/knn.clj]], and can use the [[http://en.wikipedia.org/wiki/Hausdorff_distance][Hausdorff distance]] or the [[https://en.wikipedia.org/wiki/Euclidean_distance][Euclidian distance]]to compute the distance between two images. The Hausdorff distance is implemented in [[src/recognition/classification/hausdorff.clj][recognition/classification/hausdorff.clj]], and the Euclidian distance in [[src/recognition/classification/euclidian.clj][recognition/classification/euclidian.clj]]. However, this implementation of the kNN algorithm is really slow, and that is the reason why OpenCV's kNN is used by default.

A neural network classifier using [[http://www.heatonresearch.com/][Encog]] is also implemented, in [[src/recognition/classification/nn.clj][recognition/classification/nn.clj]].

All the parts of the classification step are assembled in [[src/recognition/classification/classification.clj][recognition/classification/classification.clj]], and takes as input the image (output by the preprocessing step) and a file describing the segments (output by the segmentation step), and outputs a file describing the class of each segment (as a vector of 5-element vectors, where the 4 first elements are the segment description and the last element is the class (as a symbol) of the vector) **** Improvements The segmentation might be improved by fine tuning the parameters. The level-0 and level-1 segmentation works quite accurately, but the level-2 segmentation performs really poorly at the moment.

The symbol recognition process is currently not accurate enough. It might be because a big part (around 25%) of the training set consists of black noteheads. This part could be reduced, and the rest of the training set could be improved.

The kNN algorithm implemented by hand also suffers from huge performance issues.

*** TODO Part 4: Musical Semantics The musical semantics are defined by a set of rule, as the following LL(1) grammar:

#+BEGIN_SRC text

→

<P'> →

→ ε

→

 <note_body>

 → sharp
        flat
        natural

 → 
         dot_set
     

 →  
              

 → common_time
         cut_time
         time_four
         time_four_four
         time_six_eight
         time_three
         time_three_four
         time_two
         time_two_four

 → g_clef
         g_clef_8vb
         f_clef
         c_clef

 → eighth_rest
         one_16th_rest
         quarter_rest

 → notehead_black
             notehead_black_2
             notehead_black_3
             notehead_void
             notehead_void_2
             whole_note
             whole_note_2

 → beam
         beam_hook

 → flag_1
         flag_1_up
         flag_2
         flag_2_up
#+END_SRC
** MusicXML parser
A simple MusicXML parser is implemented in [[src/musicxml.clj][musicxml.clj]], and converts
MusicXML files to the notation described in the next section,
according to the rules given in [[doc/conversion/conversion.pdf]].
** Musical Notation
This section describes the musical notation used to describe the
scores. The notation is implemented in [[src/overscore/notation.clj]], and
some examples of scores transcribed into this notation can be found in
[[src/overscore/examples]].
*** Notes and rests
The most basic element of a score is a note. A note is expressed as its
duration and its pitch:

#+BEGIN_SRC clojure
(play :A4 1)
#+END_SRC

This corresponds to a 440Hz A, played as a quarter note. The
interpretation of the duration depends on the time signature and the
tempo of the score. In this case, we assume that the time signature is
4/4, so a duration of 1 corresponds to a quarter of the entire bar
(so, a quarter note).

A rest is simply a note without pitch, which is noted =:rest=:

#+BEGIN_SRC clojure
(play :rest 1)
#+END_SRC

*** Bars
A bar contains notes played at certain times. With the most basic
constructs, it can be defined as a set of notes and the time they have
to be played at:

#+BEGIN_SRC clojure
(bar
  (beat 0 (play :C4 1))
  (beat 1 (play :A4 1))
  (beat 2 (play :G4 1))
  (beat 3 (play :C5 1)))
#+END_SRC

A bar can also be named, to be reffered to later:
#+BEGIN_SRC clojure
(defbar foo
  (beat 0 (play :C4 1))
  (beat 1 (play :A4 1))
  (beat 2 (play :G4 1))
  (beat 3 (play :C5 1)))
#+END_SRC

Multiple combinators simplifies the notation, and are described later.
*** Progressions
A progression is a set of bars to be played in sequence. It can also
be defined anonymously with =prog=, or named with =defprog=:

#+BEGIN_SRC clojure
(defprog foo-twice
  foo foo)
#+END_SRC

Progression definitions can also be simplified through the use of
combinators described later.

The tempo and time signature can be changed during a progression:

#+BEGIN_SRC clojure
(defprog foo-twice
  {:tempo 60} foo {:tempo 40} foo)
#+END_SRC

*** Songs
A song consists of a set of progressions, played simultaneously,
associated with a set of instruments:

#+BEGIN_SRC clojure
(defsong foo-song
  {:time-signature [4 4] :tempo 60}
  [foo-twice sampled-piano]
  [foo-twice pad])
#+END_SRC
*** Combinators
Notes, bars and progressions are internally represented as a function
that takes a state (containing the tempo and the time signature), a
time (at which to play the element), and an instrument, and returns
the duration of the element. When called, those function spawn
Overtone notes that will be played at the given time, and returns
immediately. So, they are all considered as /elements/, and can be
manipulated with the following predefined combinators:

  - =play-chords=: plays all the arguments at the same time
  - =play-seq=: play all the arguments one after the other
  - =simple-seq=: a macro that ease the writing of multiple notes in
    sequence, without needing to call =simple-seq= and =play=
  - =beat=: delay the time at which the element will be played by n beats
  - =repeat-elements=: repeat an element n times

For example:

#+BEGIN_SRC clojure
(defbar foo
  (repeat-elements 2
    (play-seq
      (simple-seq 1/2 :C4 :A4 :G4)
      (play-chord
        (play :C5 1/2)
        (play :A5 1/2)
        (play :G5 1/2)))))

(defprog foo-prog
  (repeat-elements 2 foo))
#+END_SRC

If needed, other combinators can be defined easily, since they only
consists of manipulating clojure functions.

*** Playing a score
A score can be played by using the =start= function, which takes a
song as argument. A element (note, bar, progression) can be played
using the =start-element= function, which takes as argument at least
the element and the instrument it should be played with (and
optionally the tempo and time signature).
*** Improvements
The notation currently works well, but lacks lots of classical musical
constructs. It should not be to hard to extend it to support more
complicated musical constructs (ties, slurs, tuplets, staccato,
mordents, …).

One current defect of the implementation of the notation is that it
spawns lots of Overtone nodes, and this number is limited by the
SuperCollider synth server (while apparently this limitation could be
changed). This might results in bugs ("No more IDs!" message) when
stopping a song when it has not been entirely playing, and relaunching
it later. See [[http://groups.google.com/group/overtone/tree/browse_frm/month/2012-10/8adf6710630c6987?rnum=51&_done=%2Fgroup%2Fovertone%2Fbrowse_frm%2Fmonth%2F2012-10%3F][here]] for more explanations.
* License
Copyright (C) 2013 Quentin Stievenart

Distributed under the Eclipse Public License, the same as Clojure.
* Bibliography
The papers cited in this documentation are given in this section. For
more papers about the topic of OMR, see =doc/design/design.pdf=.

# >
  - A. Rebelo, I. Fujinaga, F. Paszkiewicz, A. R. S. Marcal,
    C. Guedes, and J. S. Cardoso, /Optical Music Recognition -
    state-of-the-art and open issues/, 2012, [[http://www.inescporto.pt/~jsc/publications/journals/2012ARebeloIJMIR.pdf][link]].
# >
  - I. Fujinaga, /Optical Music Recognition using Projections/, 1988,
    [[http://digitool.library.mcgill.ca/thesisfile61870.pdf][link]].
# >
  - A. Desaedeleer, /Reading Sheet Music/, 2006, [[http://sourceforge.net/projects/openomr/][link to OpenOMR]] (pdf
    is included in the sources).

overscore
overscore copied to clipboard

Metadata

image from doc/yproj.png generated by

(def img (ImageIO/read (File. "/home/quentin/p/overscore/data/furelise.png")))

(def p (projection img :y))

(def chart

(set-background-alpha

(bar-chart (range (length p))

p :vertical false ) 0))

(.setVisible (.getRangeAxis (.getCategoryPlot chart)) false)

(.setVisible (.getDomainAxis (.getCategoryPlot chart)) false)

(save chart "foo.png" :width 2745 :height 3611)

then assembled with the png of the sheet

← Metadata

Owner

Metadata

overscore overscore copied to clipboard

Metadata

image from doc/yproj.png generated by

(def img (ImageIO/read (File. "/home/quentin/p/overscore/data/furelise.png")))

(def p (projection img :y))

(def chart

(set-background-alpha

(bar-chart (range (length p))

p :vertical false ) 0))

(.setVisible (.getRangeAxis (.getCategoryPlot chart)) false)

(.setVisible (.getDomainAxis (.getCategoryPlot chart)) false)

(save chart "foo.png" :width 2745 :height 3611)

then assembled with the png of the sheet

← Metadata

Owner

Metadata

overscore
overscore copied to clipboard