[Bug] Wan22-TI2V Inference Error

Open JeremieMelo opened this issue 2 months ago • 1 comments

Description

Briefly describe the bug you encountered.

When running ./scripts/wan22/run_wan22_ti2v_i2v.sh (with model_path/lightx2v path configured) It first reports this error File "/host_home/projects/lightx2v/lightx2v/models/video_encoders/hf/wan/vae_2_2.py", line 755, in encode x[:, :, :1, :, :], ~^^^^^^^^^^^^^^^^ IndexError: too many indices for tensor of dimension 4

It seems that this img is a 4D tensor, and encoder requires a 5D tensor.

to tensor

img = TF.to_tensor(img).sub_(0.5).div_(0.5).cuda().unsqueeze(1) vae_encoder_out = self.get_vae_encoder_output(img)

After adding a batch dimension to the img tensor, it further reports this error. File "/host_home/projects/lightx2v/lightx2v/models/runners/default_runner.py", line 198, in _run_input_encoder_local_i2v vae_encode_out, latent_shape = self.run_vae_encoder(img_ori if self.vae_encoder_need_img_original else img) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: too many values to unpack (expected 2)

It seems that Wan22Dense only returns vae_encode_out, but this runner requires 2 return values.

I wonder what the correct way is to run Wan2.2-TI2V-5B inference? Is this a bug in the code, or I didn't use it correctly?

Steps to Reproduce

I just docker pull lightx2v/lightx2v:25101501-cu128; created a docker container, git clone https://github.com/ModelTC/LightX2V; And run the scripts.

Expected Result

Describe the normal behavior you expected.

Actual Result

Describe the abnormal situation that actually occurred.

Environment Information

Operating System: [e.g., Ubuntu 22.04]
Commit ID: [Version of the project]

Log Information

Please provide relevant error logs or debugging information.

Additional Information

If there is any other information that can help solve the problem, please add it here.

Oct 26 '25 21:10 JeremieMelo