mirdata icon indicating copy to clipboard operation
mirdata copied to clipboard

[WIP] OpenMIC2018

Open bmcfee opened this issue 3 years ago • 13 comments

This PR adds a dataset class for openmic2018. Fixes #253

The basic functionality is there, but there are a few things left to sort out:

  1. How should we handle the individual (anonymized, disaggregated) annotations?
  2. JAMS conversion is not done yet. This is currently blocked by #541
  3. More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.

bmcfee avatar Nov 09 '21 16:11 bmcfee

I think I don't understand why the loader tests are failing / what the expected behavior is here:

https://github.com/mir-dataset-loaders/mirdata/blob/e591c5411c41591e8606812df869dca1ad52ee0f/tests/test_loaders.py#L323-L355

(Incidentally, this would be a bit cleaner using a fixture to iterate over the datasets, rather than sticking them all in a loop inside the test. You could cut down on prints this way and isolate failures.)

bmcfee avatar Nov 09 '21 17:11 bmcfee

Still not making any headway on the loader tests. The placeholder test fails because it's trying to load metadata from a path that doesn't exist:

   @core.cached_property
    def _metadata(self):
        metadata_path = Path(self.data_home) / "openmic-2018-metadata.csv"
    
        try:
            with open(metadata_path, "r") as fdesc:
                # index column is second to last
                metadata = pd.read_csv(fdesc, index_col=-2)
        except FileNotFoundError as exc:
>           raise FileNotFoundError(
                f"Metadata file {metadata_path} not found. " "Did you run .download?"
            ) from exc
E           FileNotFoundError: Metadata file not/a/real/path/openmic2018/openmic-2018-metadata.csv not found. Did you run .download?

This seems to me like exactly what should happen? Clearly I'm missing something here: why is this expected to work?

bmcfee avatar Nov 10 '21 12:11 bmcfee

Bumping this one -- lil help?

bmcfee avatar Nov 24 '21 14:11 bmcfee

Hello @bmcfee! The problem is in mirdata/datasets/openmic2018.py, lines 159-160:

# -- set the split
self.split = self._track_metadata.get("split") 

The problem is in loading the metadata file in the Track class init. The issue could be fixed by converting this self.split attribute into a track property just as artist or title in your loader are defined. Hope this helps!

genisplaja avatar Nov 25 '21 09:11 genisplaja

Thanks @genisplaja, that's easy enough to fix. I still don't understand what this test is trying to accomplish though: it seems like it ought to be an intentional failure.

bmcfee avatar Nov 25 '21 12:11 bmcfee

Codecov Report

Merging #544 (c125594) into master (c4759b4) will increase coverage by 0.02%. The diff coverage is 97.91%.

@@            Coverage Diff             @@
##           master     #544      +/-   ##
==========================================
+ Coverage   96.59%   96.61%   +0.02%     
==========================================
  Files          48       49       +1     
  Lines        5908     6004      +96     
==========================================
+ Hits         5707     5801      +94     
- Misses        201      203       +2     

codecov[bot] avatar Nov 25 '21 15:11 codecov[bot]

Might be a good time to revive this one.. is there anything that needs doing from my side here?

bmcfee avatar Apr 17 '22 14:04 bmcfee

Attempting to bump this one again

bmcfee avatar Jun 08 '22 12:06 bmcfee

image

bmcfee avatar Jul 07 '22 15:07 bmcfee

CI failures here seem to be unrelated to this PR. I've fetched upstream and rebased, so not sure what's going on here (unless master is broken).

bmcfee avatar Jul 18 '22 14:07 bmcfee

there is an error in one of the existing tests in the master branch indeed. I will look into it this week.

nkundiushuti avatar Jul 19 '22 14:07 nkundiushuti

I have fixed the problem on the main branch. so you can update this PR with the master

nkundiushuti avatar Jul 22 '22 08:07 nkundiushuti

if no one objects, I propose merging this into main branch. pandas is already installed as a dependency of jams

nkundiushuti avatar Aug 08 '22 09:08 nkundiushuti

Thanks for merging this - I do think there are still some unresolved issues to clear up as noted in my original post:

  1. How should we handle the individual (anonymized, disaggregated) annotations?
  2. JAMS conversion is not done yet. This is currently blocked by https://github.com/mir-dataset-loaders/mirdata/issues/541
  3. More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.

bmcfee avatar Sep 21 '22 17:09 bmcfee

hi Brian! could you add 1 and 3 as issues so we keep track of them?

nkundiushuti avatar Sep 22 '22 08:09 nkundiushuti

@nkundiushuti to be honest, this PR languished for so long that I don't have a clear recollection of the issues.

For (3), I don't think I ever fully understood what the tests were, or what else could/should be added. I think this should be more on the maintainers to decide rather than contributors.

For (1), the question is pretty well described in #253 - I'd suggest re-opening that issue until it's entirely resolved. There are also some other finnicky points in #253 that I raised for discussion, but never received any feedback on (eg erroring out when users request random splits).

bmcfee avatar Sep 28 '22 19:09 bmcfee