LTX-Video ValueError: Cannot load because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]).

import torch
from diffusers import AutoencoderKLLTXVideo, LTXImageToVideoPipeline, LTXVideoTransformer3DModel

# `single_file_url` could also be https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.1.safetensors
single_file_url = "https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.5.safetensors"
transformer = LTXVideoTransformer3DModel.from_single_file(
  single_file_url, torch_dtype=torch.bfloat16
)
vae = AutoencoderKLLTXVideo.from_single_file(single_file_url, torch_dtype=torch.bfloat16)
pipe = LTXImageToVideoPipeline.from_pretrained(
  "Lightricks/LTX-Video", transformer=transformer, vae=vae, torch_dtype=torch.bfloat16
)

# ... inference code ...

The above code will encounter the following error:

Traceback (most recent call last):
  File "./ltx-video/1.py", line 9, in <module>
    vae = AutoencoderKLLTXVideo.from_single_file(single_file_url, torch_dtype=torch.bfloat16)
  File "/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/python3.10/site-packages/diffusers/loaders/single_file_model.py", line 355, in from_single_file
    unexpected_keys = load_model_dict_into_meta(
  File "/python3.10/site-packages/diffusers/models/model_loading_utils.py", line 230, in load_model_dict_into_meta
    raise ValueError(
ValueError: Cannot load  because encoder.conv_out.conv.weight expected shape torch.Size([129, 512, 3, 3, 3]), but got torch.Size([129, 2048, 3, 3, 3]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

It seems that the 0.9.5 version is not integrated into the diffusers.

Apr 08 '25 04:04 maxin-cn

+1 ,the same problem

Apr 16 '25 04:04 Zh1ym

same problem, only v0.9.1 is supported for training.

Apr 21 '25 07:04 XuWuLingYu

@maxin-cn @Zh1ym 0.9.5 for inference is integrated in diffusers. I am not sure about training.

I have integrated it here (All 3 inference types are supported) https://github.com/newgenai79/sd-diffuser-webui

You can refer this file for code https://github.com/newgenai79/sd-diffuser-webui/blob/main/modules/video_generation/tab_ltx095.py

Apr 21 '25 08:04 nitinmukesh

any updates on this? running into the same problem, can't use 0.9.5 for diffusers inference

Apr 25 '25 22:04 rayli09