audiomate icon indicating copy to clipboard operation
audiomate copied to clipboard

Extend dataset source to music datasets

Open faroit opened this issue 4 years ago • 5 comments

Give the package name __audio__mate (instead of speechmate ;-)), it would be great if also music datasets could be included here. As many of the music sets were already listed in #44, for music dataset, another python package called mir_data does already exist. It comes with less features (e.g. no processing) as it focus more on metadata than on audio loading. However it would be great to find a way for both packages to co-exist.

One way could be, for example to make mir_data a dependency of audiomate to load these additional datasets without duplicating code.

This issue should just trigger a discussion here, so I would like to include @rabitt @lostanlen @magdalenafuentes here

faroit avatar Mar 11 '20 09:03 faroit

Thanks for bringing this to our attention.

We already have some music/noise datasets in our collection, for example GTZAN. So we do not have anything against music. We just have not worked that much with it, therefore the selection is significantly smaller.

Speaking for myself: I am open to collaboration. Increasing the number of dependencies is my least favourite option, though. We already have too many and it increases the complexity for us and our users.

aahlenst avatar Mar 11 '20 09:03 aahlenst

cc @lostanlen @magdalenafuentes

Hey @aahlenst ! We're also open to collaboration. For some context, the goal of mirdata is to act a bit like sklearn.datasets but for music. mirdata is much less standardized that sklearn.datasets or audiomate because we're supporting many different tasks and task definitions.

We've converged on supporting

  • downloaders
  • loaders for annotations and audio (most focus is on the annotation)
  • validation
  • compatibility with mir_eval

On our side, we don't have any plans to go beyond music datasets, and so far it seems like mirdata and audiomate are quite complementary. @faroit I'm curious to hear how you see these two library's interacting (if at all), or how we can better support the use cases that audiomate provides and we don't.

rabitt avatar Mar 12 '20 21:03 rabitt

@rabitt sorry for the slow response. I do not have a strong opinion about how to collaborate. I am currently only reviewing this package and to me, there is a significant overlap between mirdata and this project that should somehow be noted. I think the minimum solution would be a statement on both projects with related dataloading python packages listing each other. Maybe it would be great to discuss further things in the future, but I would encourage the project owners @ynop and @rabitt/@lostanlen @magdalenafuentes to discuss this directly.

faroit avatar May 18 '20 14:05 faroit

If anyone wants to work on this: I‘d create a separate module that depends on both libraries. This can either live here or in its own repository. This shields both projects from additional dependencies. If guidance is needed or infrastructure (interfaces, methods, ...) missing on audiomate‘s side, please let us know.

aahlenst avatar May 18 '20 20:05 aahlenst

Hey @aahlenst ! Thanks for taking the initiative on this, we're happy to help. Also, thanks @faroit for pointing this out, agree we should discuss on best ways of both projects to co-exist and hopefully enhance each other. The idea of a separate module sounds good, though could you explain a little bit more what do you have in mind?

magdalenafuentes avatar May 18 '20 23:05 magdalenafuentes