[Bug] Wan22-TI2V Inference Error
Description
Briefly describe the bug you encountered.
When running ./scripts/wan22/run_wan22_ti2v_i2v.sh (with model_path/lightx2v path configured) It first reports this error File "/host_home/projects/lightx2v/lightx2v/models/video_encoders/hf/wan/vae_2_2.py", line 755, in encode x[:, :, :1, :, :], ~^^^^^^^^^^^^^^^^ IndexError: too many indices for tensor of dimension 4
It seems that this img is a 4D tensor, and encoder requires a 5D tensor.
to tensor
img = TF.to_tensor(img).sub_(0.5).div_(0.5).cuda().unsqueeze(1) vae_encoder_out = self.get_vae_encoder_output(img)
After adding a batch dimension to the img tensor, it further reports this error. File "/host_home/projects/lightx2v/lightx2v/models/runners/default_runner.py", line 198, in _run_input_encoder_local_i2v vae_encode_out, latent_shape = self.run_vae_encoder(img_ori if self.vae_encoder_need_img_original else img) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: too many values to unpack (expected 2)
It seems that Wan22Dense only returns vae_encode_out, but this runner requires 2 return values.
I wonder what the correct way is to run Wan2.2-TI2V-5B inference? Is this a bug in the code, or I didn't use it correctly?
Steps to Reproduce
I just docker pull lightx2v/lightx2v:25101501-cu128; created a docker container, git clone https://github.com/ModelTC/LightX2V; And run the scripts.
Expected Result
Describe the normal behavior you expected.
Actual Result
Describe the abnormal situation that actually occurred.
Environment Information
- Operating System: [e.g., Ubuntu 22.04]
- Commit ID: [Version of the project]
Log Information
Please provide relevant error logs or debugging information.
Additional Information
If there is any other information that can help solve the problem, please add it here.