This PR adds a dataset class for openmic2018. Fixes #253

The basic functionality is there, but there are a few things left to sort out:

How should we handle the individual (anonymized, disaggregated) annotations?
JAMS conversion is not done yet. This is currently blocked by #541
More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.

Nov 09 '21 16:11 bmcfee

I think I don't understand why the loader tests are failing / what the expected behavior is here:

https://github.com/mir-dataset-loaders/mirdata/blob/e591c5411c41591e8606812df869dca1ad52ee0f/tests/test_loaders.py#L323-L355

(Incidentally, this would be a bit cleaner using a fixture to iterate over the datasets, rather than sticking them all in a loop inside the test. You could cut down on prints this way and isolate failures.)

Nov 09 '21 17:11 bmcfee

Still not making any headway on the loader tests. The placeholder test fails because it's trying to load metadata from a path that doesn't exist:

   @core.cached_property
    def _metadata(self):
        metadata_path = Path(self.data_home) / "openmic-2018-metadata.csv"
    
        try:
            with open(metadata_path, "r") as fdesc:
                # index column is second to last
                metadata = pd.read_csv(fdesc, index_col=-2)
        except FileNotFoundError as exc:
>           raise FileNotFoundError(
                f"Metadata file {metadata_path} not found. " "Did you run .download?"
            ) from exc
E           FileNotFoundError: Metadata file not/a/real/path/openmic2018/openmic-2018-metadata.csv not found. Did you run .download?

This seems to me like exactly what should happen? Clearly I'm missing something here: why is this expected to work?

Nov 10 '21 12:11 bmcfee

Bumping this one -- lil help?

Nov 24 '21 14:11 bmcfee

Hello @bmcfee! The problem is in mirdata/datasets/openmic2018.py, lines 159-160:

# -- set the split
self.split = self._track_metadata.get("split")

The problem is in loading the metadata file in the Track class init. The issue could be fixed by converting this self.split attribute into a track property just as artist or title in your loader are defined. Hope this helps!

Nov 25 '21 09:11 genisplaja

Thanks @genisplaja, that's easy enough to fix. I still don't understand what this test is trying to accomplish though: it seems like it ought to be an intentional failure.

Nov 25 '21 12:11 bmcfee

Codecov Report

Merging #544 (c125594) into master (c4759b4) will increase coverage by 0.02%. The diff coverage is 97.91%.

@@            Coverage Diff             @@
##           master     #544      +/-   ##
==========================================
+ Coverage   96.59%   96.61%   +0.02%     
==========================================
  Files          48       49       +1     
  Lines        5908     6004      +96     
==========================================
+ Hits         5707     5801      +94     
- Misses        201      203       +2

Nov 25 '21 15:11 codecov[bot]

Might be a good time to revive this one.. is there anything that needs doing from my side here?

Apr 17 '22 14:04 bmcfee

Attempting to bump this one again

Jun 08 '22 12:06 bmcfee

Jul 07 '22 15:07 bmcfee

CI failures here seem to be unrelated to this PR. I've fetched upstream and rebased, so not sure what's going on here (unless master is broken).

Jul 18 '22 14:07 bmcfee

there is an error in one of the existing tests in the master branch indeed. I will look into it this week.

Jul 19 '22 14:07 nkundiushuti

I have fixed the problem on the main branch. so you can update this PR with the master

Jul 22 '22 08:07 nkundiushuti

if no one objects, I propose merging this into main branch. pandas is already installed as a dependency of jams

Aug 08 '22 09:08 nkundiushuti

Thanks for merging this - I do think there are still some unresolved issues to clear up as noted in my original post:

How should we handle the individual (anonymized, disaggregated) annotations?

JAMS conversion is not done yet. This is currently blocked by https://github.com/mir-dataset-loaders/mirdata/issues/541

More tests (?) -- the basics are here, but I wanted to get feedback on what's here so far before developing further.

Sep 21 '22 17:09 bmcfee

hi Brian! could you add 1 and 3 as issues so we keep track of them?

Sep 22 '22 08:09 nkundiushuti

@nkundiushuti to be honest, this PR languished for so long that I don't have a clear recollection of the issues.

For (3), I don't think I ever fully understood what the tests were, or what else could/should be added. I think this should be more on the maintainers to decide rather than contributors.

For (1), the question is pretty well described in #253 - I'd suggest re-opening that issue until it's entirely resolved. There are also some other finnicky points in #253 that I raised for discussion, but never received any feedback on (eg erroring out when users request random splits).

Sep 28 '22 19:09 bmcfee

mirdata
mirdata copied to clipboard

[WIP] OpenMIC2018

Codecov Report

mirdata mirdata copied to clipboard

[WIP] OpenMIC2018

Codecov Report

mirdata
mirdata copied to clipboard