audiolm-pytorch icon indicating copy to clipboard operation
audiolm-pytorch copied to clipboard

Support our open source music pretrained Transformer

Open a43992899 opened this issue 2 years ago • 6 comments

Hi, we are researchers from the MAP (music audio pre-train) project. We pre-train transformer LMs on large-scale music audio datasets. See below. Our model, MERT, uses a similar method as HuBERT and has verified its performance on downstream music information retrieval tasks. It has been released on hugging face and can be used interchangeably with HuBERT loading code: model = HubertModel.from_pretrained("m-a-p/MERT-v0") We are currently working on training a better base model and scaling up to a large model with more music+speech data. Using our weights as an initialization will be a better start than using speech HuBERT. Better checkpoints will be released soon.

https://huggingface.co/m-a-p/MERT-v0

a43992899 avatar Jan 28 '23 19:01 a43992899

Hi @a43992899 have you guys run this model on http://hearbenchmark.com ? It would be useful to understand how well your approach generalizes across a wide variety of tasks, requiring different levels of understanding. The leaderboard still accepts submissions. Committee member @jordieshier is also at QMUL and can help you write the 80 lines of code to make your model HEAR API compatible.

@lucidrains apologies if you find my response to be hijacking the thread, if so just comment. I'm listening.

turian avatar Jan 30 '23 08:01 turian

Hi @a43992899 have you guys run this model on http://hearbenchmark.com ? It would be useful to understand how well your approach generalizes across a wide variety of tasks, requiring different levels of understanding. The leaderboard still accepts submissions. Committee member @jordieshier is also at QMUL and can help you write the 80 lines of code to make your model HEAR API compatible.

@lucidrains apologies if you find my response to be hijacking the thread, if so just comment. I'm listening.

Sure, I think you guys have come to one of us. We would love to submit a result to HEAR~ Actually we have a better checkpoint on the way, we will try to submit some of our checkpoints to the benchmark~~

a43992899 avatar Jan 30 '23 19:01 a43992899

Hey Ruibin @a43992899, do you just mean it can be used interchangeably with transformers .from_pretrained, or will it work as a drop-in replacement with audiolm-pytorch?

scf4 avatar Feb 19 '23 16:02 scf4

@scf4 Hi, we are using the same fairseq codebase as the speech HuBERT, but we only released the huggingface checkpoint for now. It seems like audiolm-pytorch is using the fairseq checkpoint, so it is not drop-in replacement. We are planning to release the fairseq checkpoint in the future.

a43992899 avatar Feb 23 '23 02:02 a43992899

@scf4 Hi, we are using the same fairseq codebase as the speech HuBERT, but we only released the huggingface checkpoint for now. It seems like audiolm-pytorch is using the fairseq checkpoint, so it is not drop-in replacement. We are planning to release the fairseq checkpoint in the future.

Hello, has your fairseq checkpoint been opend?

lzl1456 avatar Mar 14 '23 08:03 lzl1456

@lzl1456 Not yet. Will have to wait a bit longer until we finish the paper.

a43992899 avatar Apr 18 '23 03:04 a43992899