ValueError: Cannot load because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]).
import torch
from diffusers import AutoencoderKLLTXVideo, LTXImageToVideoPipeline, LTXVideoTransformer3DModel
# `single_file_url` could also be https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.1.safetensors
single_file_url = "https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.5.safetensors"
transformer = LTXVideoTransformer3DModel.from_single_file(
single_file_url, torch_dtype=torch.bfloat16
)
vae = AutoencoderKLLTXVideo.from_single_file(single_file_url, torch_dtype=torch.bfloat16)
pipe = LTXImageToVideoPipeline.from_pretrained(
"Lightricks/LTX-Video", transformer=transformer, vae=vae, torch_dtype=torch.bfloat16
)
# ... inference code ...
The above code will encounter the following error:
Traceback (most recent call last):
File "./ltx-video/1.py", line 9, in <module>
vae = AutoencoderKLLTXVideo.from_single_file(single_file_url, torch_dtype=torch.bfloat16)
File "/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/python3.10/site-packages/diffusers/loaders/single_file_model.py", line 355, in from_single_file
unexpected_keys = load_model_dict_into_meta(
File "/python3.10/site-packages/diffusers/models/model_loading_utils.py", line 230, in load_model_dict_into_meta
raise ValueError(
ValueError: Cannot load because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.
It seems that the 0.9.5 version is not integrated into the diffusers.
+1 ,the same problem
same problem, only v0.9.1 is supported for training.
@maxin-cn @Zh1ym 0.9.5 for inference is integrated in diffusers. I am not sure about training.
I have integrated it here (All 3 inference types are supported) https://github.com/newgenai79/sd-diffuser-webui
You can refer this file for code https://github.com/newgenai79/sd-diffuser-webui/blob/main/modules/video_generation/tab_ltx095.py
any updates on this? running into the same problem, can't use 0.9.5 for diffusers inference