AudioLDM
AudioLDM copied to clipboard
RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".
I can't run audioldm -t "A hammer is hitting a wooden surface" or anything else in the command line that loads the audioldm ckpt
I installed everything in the steps here: Prepare running environment conda create -n audioldm python=3.8; conda activate audioldm pip3 install audioldm git clone https://github.com/haoheliu/AudioLDM; cd AudioLDM Start the web application (powered by Gradio) python3 app.py A link will be printed out. Click the link to open the browser and play.
I have audioldm-m-full downloaded by link. I don't believe it's corrupted.
Is there something wrong with latent diffusion? State_dict? How can I fix the issue?
Here's what I ran in the anaconda prompt and what I got out of it.
(audioldm) C:\AudioLDM\AudioLDM>audioldm -t "A hammer is hitting a wooden surface"
Load AudioLDM: %s audioldm-m-full
DiffusionWrapper has 415.95 M params.
C:\Users\Admin\anaconda3\envs\audioldm\lib\site-packages\torchlibrosa\stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = librosa.util.pad_center(fft_window, n_fft)
C:\Users\Admin\anaconda3\envs\audioldm\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3484.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "C:\Users\Admin\anaconda3\envs\audioldm\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Admin\anaconda3\envs\audioldm\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\AudioLDM\AudioLDM\audioldm_main.py", line 152, in
Thanks so much for any help!
Additionally I tried the colab notebook and I get the same error.
I am facing the exact same error when trying to load the model within a conda environment created by following README. The error is the same when using both "audioldm-m-full.ckpt" and "audioldm-s-full.ckpt"
Transformer library from Hugging Face seems to have been automatically updated, and it appears that the position_id has disappeared, causing an issue. Downgrading the version should resolve the problem.
I solved an issue using 'pip install --upgrade transformers==4.29.0' command.
Transformer library from Hugging Face seems to have been automatically updated, and it appears that the position_id has disappeared, causing an issue. Downgrading the version should resolve the problem.
I solved an issue using 'pip install --upgrade transformers==4.29.0' command.
Wow thanks! It finally works! Hahaha
Thanks! Applied this fix to the Colab notebook.
i spent whole day, with conda, without conda (venv), direct in system, AudioLDM just wont run for me. looks like ill give up on this one, though i really wish to try it locally
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".