polymath MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Open JCBrouwer opened this issue 1 year ago • 0 comments

Last week the pre-print, pre-trained models, and training code for MERT were released:

https://arxiv.org/abs/2306.00107
https://huggingface.co/m-a-p/MERT-v1-330M
https://github.com/yizhilll/MERT
https://huggingface.co/spaces/m-a-p/Music-Descriptor
https://huggingface.co/spaces/m-a-p/MERT-Music-Genre-Tagging-Prediction

The paper reports good performance on 14 different music understanding tasks using the representations learned by the model. Given this flexibility, I think it could be quite interesting to integrate this model into polymath.

What are some of the design considerations (functional & non-functional) for implementing this model effectively?

I think the main benefit would be just extracting deep features and using these e.g. for search.

However, maybe some of the downstream tagging tasks like instrument, mood, or genre could be used as well? In this case, despite not being state-of-the-art, it would save having separate models for each of these tags.

Jun 08 '23 08:06 JCBrouwer

polymath polymath copied to clipboard

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

polymath
polymath copied to clipboard