LAKH-MuseNet-MIDI-Dataset icon indicating copy to clipboard operation
LAKH-MuseNet-MIDI-Dataset copied to clipboard

Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)

LAKH MuseNet MIDI Dataset


Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)

Bonus: Choir on Channel 10


Please CC BY-NC-SA


Make your own with the colab or download converted output here:

Open In Colab

https://1drv.ms/u/s!Ao9gnMkvUA2KgZBWDIQJIG-JS6RpPQ?e=ur8ggN


wget install:

!wget --no-check-certificate -O LAKH-MuseNet-MIDI-Dataset.zip "https://onedrive.live.com/download?cid=8A0D502FC99C608F&resid=8A0D502FC99C608F%2118520&authkey=AN-gn1ZxEnO4khE"

Source license/attribution

The Lakh MIDI Dataset is distributed with a CC-BY 4.0 license; if you use this data in any capacity, please reference this page and my thesis:

Colin Raffel. "Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching". PhD Thesis, 2016.

Of course, I did not transcribe any of the MIDI files in the Lakh MIDI Dataset. While MIDI files have a built-in mechanism for attribution (the Copyright meta-event), it is not used consistently, so attributing each of the MIDI files in the dataset to a particular author is not feasible. If you'd like to try, here is a list of the text of all of the Copyright meta-events in the Lakh MIDI Dataset.

If you use the Million Song Dataset, please reference this paper:

Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. "The Million Song Dataset". In Proceedings of the 12th International Society for Music Information Retrieval Conference, pages 591–596, 2011.


Project Los Angeles

Tegridy Code 2022