AudioLDM RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position

I can't run audioldm -t "A hammer is hitting a wooden surface" or anything else in the command line that loads the audioldm ckpt

I installed everything in the steps here: Prepare running environment conda create -n audioldm python=3.8; conda activate audioldm pip3 install audioldm git clone https://github.com/haoheliu/AudioLDM; cd AudioLDM Start the web application (powered by Gradio) python3 app.py A link will be printed out. Click the link to open the browser and play.

I have audioldm-m-full downloaded by link. I don't believe it's corrupted.

Is there something wrong with latent diffusion? State_dict? How can I fix the issue?

Here's what I ran in the anaconda prompt and what I got out of it.

(audioldm) C:\AudioLDM\AudioLDM>audioldm -t "A hammer is hitting a wooden surface" Load AudioLDM: %s audioldm-m-full DiffusionWrapper has 415.95 M params. C:\Users\Admin\anaconda3\envs\audioldm\lib\site-packages\torchlibrosa\stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error fft_window = librosa.util.pad_center(fft_window, n_fft) C:\Users\Admin\anaconda3\envs\audioldm\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3484.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "C:\Users\Admin\anaconda3\envs\audioldm\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Admin\anaconda3\envs\audioldm\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\AudioLDM\AudioLDM\audioldm_main.py", line 152, in audioldm = build_model(model_name=args.model_name) File "C:\AudioLDM\AudioLDM\audioldm\pipeline.py", line 86, in build_model latent_diffusion.load_state_dict(checkpoint["state_dict"]) File "C:\Users\Admin\anaconda3\envs\audioldm\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".

Thanks so much for any help!

Jul 24 '23 03:07 Inkune

Additionally I tried the colab notebook and I get the same error.

Jul 24 '23 07:07 Inkune

I am facing the exact same error when trying to load the model within a conda environment created by following README. The error is the same when using both "audioldm-m-full.ckpt" and "audioldm-s-full.ckpt"

Jul 24 '23 21:07 cvillela

Transformer library from Hugging Face seems to have been automatically updated, and it appears that the position_id has disappeared, causing an issue. Downgrading the version should resolve the problem.

I solved an issue using 'pip install --upgrade transformers==4.29.0' command.

Jul 25 '23 08:07 seaone1007

Transformer library from Hugging Face seems to have been automatically updated, and it appears that the position_id has disappeared, causing an issue. Downgrading the version should resolve the problem.

I solved an issue using 'pip install --upgrade transformers==4.29.0' command.

Wow thanks! It finally works! Hahaha

Jul 25 '23 09:07 Inkune

Thanks! Applied this fix to the Colab notebook.

Jul 25 '23 09:07 olaviinha

i spent whole day, with conda, without conda (venv), direct in system, AudioLDM just wont run for me. looks like ill give up on this one, though i really wish to try it locally

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']

Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".

Aug 17 '23 17:08 Ariffffff

AudioLDM AudioLDM copied to clipboard

RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".

AudioLDM
AudioLDM copied to clipboard