fma icon indicating copy to clipboard operation
fma copied to clipboard

Known issues (and next release)

Open mdeff opened this issue 4 years ago • 2 comments

Below are issues affecting the rc1 data release that cannot be fixed without a data update. As updating is disruptive (it'll break code and make results non-comparable), it should be done sparingly, e.g., to fix a fatal flaw or many small ones discovered over time.

  • zip decompression fails because of unsupported bzip2 compression (#5)
    • [x] workaround (master): note in README to try with 7zip (5700859)
    • [ ] fix (next): zip with deflate (instead of bzip2) (#5) or zstd (#32)
  • excerpts shorter than 30s and erroneous audio length metadata (#4, #8, #36, #44)
  • erroneous ID3 tags (#27)
    • [x] workaround (master): list (#27)
    • [ ] fix (next): dump ID3 tags with technical metadata and remove from mp3
  • exact duplicate tracks (#23)
    • [ ] workaround (master): list the 937 duplicates
    • [ ] fix (next): remove them (try other methods and detect near duplicates)

Workarounds are explained in more details in the wiki.

mdeff avatar Jun 13 '20 01:06 mdeff

Branches:

  • The master branch contains the code that produced the latest released data. The usage code and documentation can be updated but should work with the released data.
  • The next branch contains the code to produce the hypothetical next release of the data. The usage code is updated to any new data format.
  • The outputs branch is based on master and contains generated data (e.g., notebook outputs and figures) for convenience (most notably to run on binder).

mdeff avatar Jun 17 '20 00:06 mdeff

Potential todos for a dataset update:

  • [ ] schema with type and explanation of every field (#14)
  • [ ] Data format: relational tables (tracks.csv, artists.csv, albums.csv) instead of a single huge tracks.csv? Consider standards like JAMS.
  • [ ] Since https://freemusicarchive.org has been acquired it became a static archive. It might make sense to dump it one last time.
  • [ ] Recompute all features available in the latest librosa (#37).
  • [ ] Consider another hosting provider (#26) or torrents (#32). It would be nice to have storage guarantees and a DOI. Maybe Zenodo would now accept it (they wouldn't in 2017).
  • [ ] Add to additional dataset lists (#35).

mdeff avatar Jun 17 '20 01:06 mdeff