polymath
polymath copied to clipboard
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Last week the pre-print, pre-trained models, and training code for MERT were released:
- https://arxiv.org/abs/2306.00107
- https://huggingface.co/m-a-p/MERT-v1-330M
- https://github.com/yizhilll/MERT
- https://huggingface.co/spaces/m-a-p/Music-Descriptor
- https://huggingface.co/spaces/m-a-p/MERT-Music-Genre-Tagging-Prediction
The paper reports good performance on 14 different music understanding tasks using the representations learned by the model. Given this flexibility, I think it could be quite interesting to integrate this model into polymath.
What are some of the design considerations (functional & non-functional) for implementing this model effectively?
I think the main benefit would be just extracting deep features and using these e.g. for search.
However, maybe some of the downstream tagging tasks like instrument, mood, or genre could be used as well? In this case, despite not being state-of-the-art, it would save having separate models for each of these tags.