DiffSynth-Studio icon indicating copy to clipboard operation
DiffSynth-Studio copied to clipboard

Hunyuan I2V model fails

Open emerdem opened this issue 11 months ago • 6 comments

Hunyuanvideo_24G is working fine however when I run hunyuanvideo_i2v_24G, I ran into the following issue:

(diffstudio) C:\Users\Emre\OneDrive\Documents\AI_YT\FluxVideoGen\DiffSynth-Studio\examples\HunyuanVideo>python hunyuanvideo_i2v_24G.py Downloading models: ['HunyuanVideoI2V'] Loading models from: models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt Traceback (most recent call last): File "C:\Users\Emre\OneDrive\Documents\AI_YT\FluxVideoGen\DiffSynth-Studio\examples\HunyuanVideo\hunyuanvideo_i2v_24G.py", line 11, in model_manager.load_models( File "C:\Users\Emre.conda\envs\diffstudio\Lib\site-packages\diffsynth\models\model_manager.py", line 422, in load_models self.load_model(file_path, model_names, device=device, torch_dtype=torch_dtype) File "C:\Users\Emre.conda\envs\diffstudio\Lib\site-packages\diffsynth\models\model_manager.py", line 404, in load_model if model_detector.match(file_path, state_dict): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Emre.conda\envs\diffstudio\Lib\site-packages\diffsynth\models\model_manager.py", line 165, in match if len(state_dict) == 0: ^^^^^^^^^^^^^^^ TypeError: object of type 'NoneType' has no len()

emerdem avatar Mar 21 '25 12:03 emerdem

models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt must not be there, I had the same issue with Wan so simply copied it over from another directory where I downloaded it

hakzarov avatar Mar 22 '25 23:03 hakzarov

The error TypeError: object of type 'NoneType' has no len() is because that models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt is not correctly downloaded. This may be caused by network errors or other reasons. You may run the scripts again to download the model, or manually download mp_rank_00_model_states.pt to the path models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt

mi804 avatar Mar 24 '25 03:03 mi804

Thanks a lot for the comments, I have downloaded the models from Modelscope manually and uploaded to the right folders. The models are now loaded correctly.

However now I am running into this issue when running the hunyuanvideo_i2v_24G.py file:

(diffstudio) C:\AI\FluxVideoGen\DiffSynth-Studio\examples\HunyuanVideo>python hunyuanvideo_i2v_24G.py Downloading models: ['HunyuanVideoI2V'] Loading models from: models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt model_name: hunyuan_video_dit model_class: HunyuanVideoDiT The following models are loaded: ['hunyuan_video_dit']. Loading models from: models/HunyuanVideoI2V/text_encoder/model.safetensors model_name: sd3_text_encoder_1 model_class: SD3TextEncoder1 The following models are loaded: ['sd3_text_encoder_1']. Loading models from: models/HunyuanVideoI2V/text_encoder_2 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.74it/s] The following models are loaded: ['hunyuan_video_text_encoder_2']. Loading models from: models/HunyuanVideoI2V/vae/pytorch_model.pt model_name: hunyuan_video_vae_decoder model_class: HunyuanVideoVAEDecoder model_name: hunyuan_video_vae_encoder model_class: HunyuanVideoVAEEncoder The following models are loaded: ['hunyuan_video_vae_decoder', 'hunyuan_video_vae_encoder']. Using sd3_text_encoder_1 from models/HunyuanVideoI2V/text_encoder/model.safetensors. Using hunyuan_video_text_encoder_2 from models/HunyuanVideoI2V/text_encoder_2. Using hunyuan_video_dit from models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt. Using hunyuan_video_vae_decoder from models/HunyuanVideoI2V/vae/pytorch_model.pt. Using hunyuan_video_vae_encoder from models/HunyuanVideoI2V/vae/pytorch_model.pt. Downloading Dataset to directory: C:\AI\FluxVideoGen\DiffSynth-Studio\examples\HunyuanVideo 2025-03-24 22:55:43,157 - modelscope - INFO - Fetching dataset repo file list... 2025-03-24 22:55:46,710 - modelscope - INFO - Got 1 files, start to download ... Downloading [data/examples/hunyuanvideo/0.jpg]: 100%|████████████████████████████████| 401k/401k [00:02<00:00, 158kB/s] Processing 1 items: 100%|███████████████████████████████████████████████████████████| 1.00/1.00 [00:02<00:00, 2.60s/it] 2025-03-24 22:55:49,312 - modelscope - INFO - Download dataset 'DiffSynth-Studio/examples_in_diffsynth' successfully. Traceback (most recent call last): File "C:\AI\FluxVideoGen\DiffSynth-Studio\examples\HunyuanVideo\hunyuanvideo_i2v_24G.py", line 42, in video = pipe(prompt, input_images=images, num_inference_steps=50, seed=0, i2v_resolution=i2v_resolution) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Emre.conda\envs\diffstudio\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ TypeError: HunyuanVideoPipeline.call() got an unexpected keyword argument 'input_images'

emerdem avatar Mar 24 '25 23:03 emerdem

You may need to update the diffsynth codebase and install it again. As the latested HunyuanVideoPipeline has supported the input_images param : https://github.com/modelscope/DiffSynth-Studio/blob/3dc28f428f3accf1b49bcc639c4dea91e48d11d0/diffsynth/pipelines/hunyuan_video.py#L133

mi804 avatar Mar 25 '25 02:03 mi804

Thank you, I have pulled the latest coda and reinstalled. This fixed the input_images error, but now encountered this one:

(diffstudio) C:\AI\FluxVideoGen\DiffSynth-Studio\examples\HunyuanVideo>python hunyuanvideo_i2v_24G.py Downloading models: ['HunyuanVideoI2V'] model.safetensors has been already in models/HunyuanVideoI2V/text_encoder. model-00001-of-00004.safetensors has been already in models/HunyuanVideoI2V/text_encoder_2. model-00002-of-00004.safetensors has been already in models/HunyuanVideoI2V/text_encoder_2. model-00003-of-00004.safetensors has been already in models/HunyuanVideoI2V/text_encoder_2. model-00004-of-00004.safetensors has been already in models/HunyuanVideoI2V/text_encoder_2. config.json has been already in models/HunyuanVideoI2V/text_encoder_2. model.safetensors.index.json has been already in models/HunyuanVideoI2V/text_encoder_2. pytorch_model.pt has been already in models/HunyuanVideoI2V/vae. mp_rank_00_model_states.pt has been already in models/HunyuanVideoI2V/transformers. Loading models from: models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt model_name: hunyuan_video_dit model_class: HunyuanVideoDiT The following models are loaded: ['hunyuan_video_dit']. Loading models from: models/HunyuanVideoI2V/text_encoder/model.safetensors model_name: sd3_text_encoder_1 model_class: SD3TextEncoder1 The following models are loaded: ['sd3_text_encoder_1']. Loading models from: models/HunyuanVideoI2V/text_encoder_2 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.32it/s] The following models are loaded: ['hunyuan_video_text_encoder_2']. Loading models from: models/HunyuanVideoI2V/vae/pytorch_model.pt model_name: hunyuan_video_vae_decoder model_class: HunyuanVideoVAEDecoder model_name: hunyuan_video_vae_encoder model_class: HunyuanVideoVAEEncoder The following models are loaded: ['hunyuan_video_vae_decoder', 'hunyuan_video_vae_encoder']. Using sd3_text_encoder_1 from models/HunyuanVideoI2V/text_encoder/model.safetensors. Using hunyuan_video_text_encoder_2 from models/HunyuanVideoI2V/text_encoder_2. Using hunyuan_video_dit from models/HunyuanVideoI2V/transformers/mp_rank_00_model_states.pt. Using hunyuan_video_vae_decoder from models/HunyuanVideoI2V/vae/pytorch_model.pt. Using hunyuan_video_vae_encoder from models/HunyuanVideoI2V/vae/pytorch_model.pt. Downloading Dataset to directory: C:\AI\FluxVideoGen\DiffSynth-Studio\examples\HunyuanVideo 2025-03-25 21:24:53,738 - modelscope - INFO - Fetching dataset repo file list... Traceback (most recent call last): File "C:\AI\FluxVideoGen\DiffSynth-Studio\examples\HunyuanVideo\hunyuanvideo_i2v_24G.py", line 42, in video = pipe(prompt, input_images=images, num_inference_steps=50, seed=0, i2v_resolution=i2v_resolution) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Emre.conda\envs\diffstudio\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "c:\ai\fluxvideogen\diffsynth-studio\diffsynth\pipelines\hunyuan_video.py", line 190, in call prompt_emb_posi = self.encode_prompt(prompt, positive=True, input_images=input_images) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\ai\fluxvideogen\diffsynth-studio\diffsynth\pipelines\hunyuan_video.py", line 106, in encode_prompt prompt_emb, pooled_prompt_emb, text_mask = self.prompter.encode_prompt( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\ai\fluxvideogen\diffsynth-studio\diffsynth\prompters\hunyuan_video_prompter.py", line 271, in encode_prompt prompt_emb, attention_mask = self.encode_prompt_using_mllm(prompt_formated, images, llm_sequence_length, device, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\ai\fluxvideogen\diffsynth-studio\diffsynth\prompters\hunyuan_video_prompter.py", line 165, in encode_prompt_using_mllm image_outputs = self.processor(images, return_tensors="pt")["pixel_values"].to(device) ^^^^^^^^^^^^^^ AttributeError: 'HunyuanVideoPrompter' object has no attribute 'processor'

emerdem avatar Mar 25 '25 21:03 emerdem

There may esixt some difference bewteen the latested codes and your local codes. It is recommended to download the latested version of Diffsynth-Studio and install it from source. Also, it is recommend to using transformers==4.47.0.

mi804 avatar Mar 26 '25 02:03 mi804