polymath icon indicating copy to clipboard operation
polymath copied to clipboard

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Open JCBrouwer opened this issue 1 year ago • 0 comments

Last week the pre-print, pre-trained models, and training code for MERT were released:

  • https://arxiv.org/abs/2306.00107
  • https://huggingface.co/m-a-p/MERT-v1-330M
  • https://github.com/yizhilll/MERT
  • https://huggingface.co/spaces/m-a-p/Music-Descriptor
  • https://huggingface.co/spaces/m-a-p/MERT-Music-Genre-Tagging-Prediction

The paper reports good performance on 14 different music understanding tasks using the representations learned by the model. Given this flexibility, I think it could be quite interesting to integrate this model into polymath.

What are some of the design considerations (functional & non-functional) for implementing this model effectively?

I think the main benefit would be just extracting deep features and using these e.g. for search.

However, maybe some of the downstream tagging tasks like instrument, mood, or genre could be used as well? In this case, despite not being state-of-the-art, it would save having separate models for each of these tags.

JCBrouwer avatar Jun 08 '23 08:06 JCBrouwer