fma
fma copied to clipboard
Known issues (and next release)
Below are issues affecting the rc1
data release that cannot be fixed without a data update. As updating is disruptive (it'll break code and make results non-comparable), it should be done sparingly, e.g., to fix a fatal flaw or many small ones discovered over time.
- zip decompression fails because of unsupported bzip2 compression (#5)
- [x] workaround (
master
): note in README to try with 7zip (5700859) - [ ] fix (
next
): zip with deflate (instead of bzip2) (#5) or zstd (#32)
- [x] workaround (
- excerpts shorter than 30s and erroneous audio length metadata (#4, #8, #36, #44)
- [x] workaround (
master
): small subset's list, medium subset's list (#8) - [x] fix (
next
): metadata from mp3 not API, ensure 30s (8077afe, 00d5b71, 840b337)
- [x] workaround (
- erroneous ID3 tags (#27)
- [x] workaround (
master
): list (#27) - [ ] fix (
next
): dump ID3 tags with technical metadata and remove from mp3
- [x] workaround (
- exact duplicate tracks (#23)
- [ ] workaround (
master
): list the 937 duplicates - [ ] fix (
next
): remove them (try other methods and detect near duplicates)
- [ ] workaround (
Workarounds are explained in more details in the wiki.
Branches:
- The
master
branch contains the code that produced the latest released data. The usage code and documentation can be updated but should work with the released data. - The
next
branch contains the code to produce the hypothetical next release of the data. The usage code is updated to any new data format. - The
outputs
branch is based onmaster
and contains generated data (e.g., notebook outputs and figures) for convenience (most notably to run on binder).
Potential todos for a dataset update:
- [ ] schema with type and explanation of every field (#14)
- [ ] Data format: relational tables (
tracks.csv
,artists.csv
,albums.csv
) instead of a single hugetracks.csv
? Consider standards like JAMS. - [ ] Since https://freemusicarchive.org has been acquired it became a static archive. It might make sense to dump it one last time.
- [ ] Recompute all features available in the latest librosa (#37).
- [ ] Consider another hosting provider (#26) or torrents (#32). It would be nice to have storage guarantees and a DOI. Maybe Zenodo would now accept it (they wouldn't in 2017).
- [ ] Add to additional dataset lists (#35).