midi_degradation_toolkit
midi_degradation_toolkit copied to clipboard
Before release: DQA of the released data!
We should do some data quality analysis of the data we are going to release. I'm thinking a notebook (also doubles as an intro to what data are available for use) which reviews the data by:
- Playing a selection of degraded and clean excerpts
- Any issues with data? Choppy? Did flattening tracks work well?
- Are degradations obvious? Are there better parameters for degradations to use?
- providing stats about number of notes in those excerpts, lengths of notes, and the actual amount of time these notes occur in etc.
- This will inform the correct seq_len to use for models (may be worth excluding silly long excerpts)
- giving some background as to where these data are from and, if possible, some summary stats about genre, or tempo, or whatever we can glean
- Summarise performance broken doWn over datasets (info available in metadata)
Essentially I want to check that the data are not rubbish, and we can hear where the degradations are!
Make sure to check for very short note (that may have been introduced by overlap checks).
I think it will be best to add this as a notebook to the ACME repo too. Keeping issue here. Do this in conjunction with #129
Can be closed.
Actually, I'd like to keep this. I've done some basic looks at the data, but haven't addressed specific things in the description.