audiolm-pytorch
audiolm-pytorch copied to clipboard
Support our open source music pretrained Transformer
Hi, we are researchers from the MAP (music audio pre-train) project. We pre-train transformer LMs on large-scale music audio datasets.
See below. Our model, MERT, uses a similar method as HuBERT and has verified its performance on downstream music information retrieval tasks. It has been released on hugging face and can be used interchangeably with HuBERT loading code: model = HubertModel.from_pretrained("m-a-p/MERT-v0")
We are currently working on training a better base model and scaling up to a large model with more music+speech data.
Using our weights as an initialization will be a better start than using speech HuBERT. Better checkpoints will be released soon.
https://huggingface.co/m-a-p/MERT-v0
Hi @a43992899 have you guys run this model on http://hearbenchmark.com ? It would be useful to understand how well your approach generalizes across a wide variety of tasks, requiring different levels of understanding. The leaderboard still accepts submissions. Committee member @jordieshier is also at QMUL and can help you write the 80 lines of code to make your model HEAR API compatible.
@lucidrains apologies if you find my response to be hijacking the thread, if so just comment. I'm listening.
Hi @a43992899 have you guys run this model on http://hearbenchmark.com ? It would be useful to understand how well your approach generalizes across a wide variety of tasks, requiring different levels of understanding. The leaderboard still accepts submissions. Committee member @jordieshier is also at QMUL and can help you write the 80 lines of code to make your model HEAR API compatible.
@lucidrains apologies if you find my response to be hijacking the thread, if so just comment. I'm listening.
Sure, I think you guys have come to one of us. We would love to submit a result to HEAR~ Actually we have a better checkpoint on the way, we will try to submit some of our checkpoints to the benchmark~~
Hey Ruibin @a43992899, do you just mean it can be used interchangeably with transformers .from_pretrained, or will it work as a drop-in replacement with audiolm-pytorch?
@scf4 Hi, we are using the same fairseq codebase as the speech HuBERT, but we only released the huggingface checkpoint for now. It seems like audiolm-pytorch is using the fairseq checkpoint, so it is not drop-in replacement. We are planning to release the fairseq checkpoint in the future.
@scf4 Hi, we are using the same fairseq codebase as the speech HuBERT, but we only released the huggingface checkpoint for now. It seems like audiolm-pytorch is using the fairseq checkpoint, so it is not drop-in replacement. We are planning to release the fairseq checkpoint in the future.
Hello, has your fairseq checkpoint been opend?
@lzl1456 Not yet. Will have to wait a bit longer until we finish the paper.