mirdata
mirdata copied to clipboard
[WIP] OpenMIC2018
This PR adds a dataset class for openmic2018. Fixes #253
The basic functionality is there, but there are a few things left to sort out:
- How should we handle the individual (anonymized, disaggregated) annotations?
- JAMS conversion is not done yet. This is currently blocked by #541
- More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.
I think I don't understand why the loader tests are failing / what the expected behavior is here:
https://github.com/mir-dataset-loaders/mirdata/blob/e591c5411c41591e8606812df869dca1ad52ee0f/tests/test_loaders.py#L323-L355
(Incidentally, this would be a bit cleaner using a fixture to iterate over the datasets, rather than sticking them all in a loop inside the test. You could cut down on prints this way and isolate failures.)
Still not making any headway on the loader tests. The placeholder test fails because it's trying to load metadata from a path that doesn't exist:
@core.cached_property
def _metadata(self):
metadata_path = Path(self.data_home) / "openmic-2018-metadata.csv"
try:
with open(metadata_path, "r") as fdesc:
# index column is second to last
metadata = pd.read_csv(fdesc, index_col=-2)
except FileNotFoundError as exc:
> raise FileNotFoundError(
f"Metadata file {metadata_path} not found. " "Did you run .download?"
) from exc
E FileNotFoundError: Metadata file not/a/real/path/openmic2018/openmic-2018-metadata.csv not found. Did you run .download?
This seems to me like exactly what should happen? Clearly I'm missing something here: why is this expected to work?
Bumping this one -- lil help?
Hello @bmcfee! The problem is in mirdata/datasets/openmic2018.py
, lines 159-160:
# -- set the split
self.split = self._track_metadata.get("split")
The problem is in loading the metadata file in the Track class init. The issue could be fixed by converting this self.split
attribute into a track property just as artist
or title
in your loader are defined. Hope this helps!
Thanks @genisplaja, that's easy enough to fix. I still don't understand what this test is trying to accomplish though: it seems like it ought to be an intentional failure.
Codecov Report
Merging #544 (c125594) into master (c4759b4) will increase coverage by
0.02%
. The diff coverage is97.91%
.
@@ Coverage Diff @@
## master #544 +/- ##
==========================================
+ Coverage 96.59% 96.61% +0.02%
==========================================
Files 48 49 +1
Lines 5908 6004 +96
==========================================
+ Hits 5707 5801 +94
- Misses 201 203 +2
Might be a good time to revive this one.. is there anything that needs doing from my side here?
Attempting to bump this one again
CI failures here seem to be unrelated to this PR. I've fetched upstream and rebased, so not sure what's going on here (unless master is broken).
there is an error in one of the existing tests in the master branch indeed. I will look into it this week.
I have fixed the problem on the main branch. so you can update this PR with the master
if no one objects, I propose merging this into main branch. pandas is already installed as a dependency of jams
Thanks for merging this - I do think there are still some unresolved issues to clear up as noted in my original post:
- How should we handle the individual (anonymized, disaggregated) annotations?
- JAMS conversion is not done yet. This is currently blocked by https://github.com/mir-dataset-loaders/mirdata/issues/541
- More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.
hi Brian! could you add 1 and 3 as issues so we keep track of them?
@nkundiushuti to be honest, this PR languished for so long that I don't have a clear recollection of the issues.
For (3), I don't think I ever fully understood what the tests were, or what else could/should be added. I think this should be more on the maintainers to decide rather than contributors.
For (1), the question is pretty well described in #253 - I'd suggest re-opening that issue until it's entirely resolved. There are also some other finnicky points in #253 that I raised for discussion, but never received any feedback on (eg erroring out when users request random splits).