lakh-pianoroll-dataset
lakh-pianoroll-dataset copied to clipboard
A collection of 174,154 multi-track piano-rolls
Source Code for Deriving Lakh Pianoroll Dataset (LPD)
The derived dataset using the default settings is available here.
-
Download Lakh MIDI Dataset (LMD) with the following script.
./scripts/download_lmd.sh
(Or, download it manually here.)
-
Set the variables
LMD_ROOT
andLPD_ROOT
inrun.sh
and variables inconfig.py
to proper values. -
Derive all subsets and versions of LPD,
matched_ids.txt
andcleansed_ids.txt
with the following script../scripts/derive_lpd.sh
Derive the labels for the LPD
The derived labels can be found at
data/labels.tar.gz
.
-
Download the labels with the following script.
./scripts/download_labels.sh
-
Derive the labels with the following script.
./scripts/derive_labels.sh
Synthesize audio files for the LPD
-
Install GNU Parallel to run the synthesizer in parallel mode.
-
Synthesize audio files from multitrack pianorolls with the following script.
./scripts/batch_synthesize.sh ./data/lpd/lpd/lpd_cleansed/ \ ./data/synthesized/lpd_cleansed 20
(The above command will synthesize all the multitrack pianorolls in the LPD-cleansed subset with 20 parallel jobs.)