midi_degradation_toolkit icon indicating copy to clipboard operation
midi_degradation_toolkit copied to clipboard

Before release: DQA of the released data!

Open JamesOwers opened this issue 6 years ago • 4 comments

We should do some data quality analysis of the data we are going to release. I'm thinking a notebook (also doubles as an intro to what data are available for use) which reviews the data by:

  • Playing a selection of degraded and clean excerpts
    • Any issues with data? Choppy? Did flattening tracks work well?
    • Are degradations obvious? Are there better parameters for degradations to use?
  • providing stats about number of notes in those excerpts, lengths of notes, and the actual amount of time these notes occur in etc.
    • This will inform the correct seq_len to use for models (may be worth excluding silly long excerpts)
  • giving some background as to where these data are from and, if possible, some summary stats about genre, or tempo, or whatever we can glean
  • Summarise performance broken doWn over datasets (info available in metadata)

Essentially I want to check that the data are not rubbish, and we can hear where the degradations are!

JamesOwers avatar Oct 09 '19 09:10 JamesOwers

Make sure to check for very short note (that may have been introduced by overlap checks).

apmcleod avatar Oct 09 '19 10:10 apmcleod

I think it will be best to add this as a notebook to the ACME repo too. Keeping issue here. Do this in conjunction with #129

JamesOwers avatar Aug 12 '20 12:08 JamesOwers

Can be closed.

apmcleod avatar Oct 19 '20 09:10 apmcleod

Actually, I'd like to keep this. I've done some basic looks at the data, but haven't addressed specific things in the description.

JamesOwers avatar Oct 19 '20 10:10 JamesOwers