diffusers
diffusers copied to clipboard
HunyuanVideoImageToVideoPipeline memory leak
Describe the bug
RuntimeError: Invalid buffer size: 635.13 GB on Apple Silicon with mps, when the readme claims 60GB. Smaller res possible but still with 10x the memory claimed. What's going on here? Is it leaking memory?
Reproducible from boilerplate example code:
from diffusers import HunyuanVideoImageToVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import load_image, export_to_video
# Available checkpoints: "hunyuanvideo-community/HunyuanVideo-I2V" and "hunyuanvideo-community/HunyuanVideo-I2V-33ch"
model_id = "hunyuanvideo-community/HunyuanVideo-I2V"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoImageToVideoPipeline.from_pretrained(
model_id, transformer=transformer, torch_dtype=torch.float16
)
pipe.vae.enable_tiling()
pipe.to("mps")
prompt = "A man with short gray hair plays a red electric guitar."
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png"
)
output = pipe(image=image, prompt=prompt).frames[0]
export_to_video(output, "output.mp4", fps=15)
Mac sequoia 15.5
Reproduction
see above
Logs
System Info
0.34.0.dev0 py3.11 macos sequoia.15.5
Who can help?
No response
Hi @openSourcerer9000 unfortunately MPS sometimes lacks efficient kernels for certain operations, which could be the reason for the memory spike. Does the stack trace mention where in the inference process it OOMs?
No, just that it wants 600GB memory. Do y'all have more info on memory benchmarks? The official repo mentions 60GB but not which optimizations they may be using. With a multi model system it's hard to gauge how much memory would actually be expected.
I also don't see a difference when loading a Q4 version of the transformer! Loading it first from a file and then specifying it in the pipeline.
On Tue, Jun 10, 2025, 11:46 AM Dhruv Nair @.***> wrote:
DN6 left a comment (huggingface/diffusers#11676) https://github.com/huggingface/diffusers/issues/11676#issuecomment-2960129244
Hi @openSourcerer9000 https://github.com/openSourcerer9000 unfortunately MPS sometimes lacks efficient kernels for certain operations, which could be the reason for the memory spike. Does the stack trace mention where in the inference process it OOMs?
— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/11676#issuecomment-2960129244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOYQCMWAEOFBDY27WDXU4DT3C4KWVAVCNFSM6AAAAAB6ZH5EQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNRQGEZDSMRUGQ . You are receiving this because you were mentioned.Message ID: @.***>
hi @openSourcerer9000 we don't run memory benchmark on mps and this is likely only specific to mps sharing a complete stack trace would help the community to chime in with solutions
Any memory benchmarks are good (cuda or mps). The published numbers are only from the official hunyuan repo, and these things on arxiv are never peer reviewed anyway so like any marketing it should be taken with a grain of salt.
Ran it again, there is actually a traceback:
python
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[16], line 27
---> 27 output = pipe(image=img,
28 prompt=prompt,
29 prompt_2=prompt,
30 num_inference_steps=20,
31 width=1280,
32 height=720,
33 num_frames=129,
34 # # guidance_scale=7.5,
35 # negative_prompt="low quality, worst quality, low resolution, blurry, deformed, distorted, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, out of frame",
36 ).frames[0]
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video_image2video.py:914, in HunyuanVideoImageToVideoPipeline.__call__(self, image, prompt, prompt_2, negative_prompt, negative_prompt_2, height, width, num_frames, num_inference_steps, sigmas, true_cfg_scale, guidance_scale, num_videos_per_prompt, generator, latents, prompt_embeds, pooled_prompt_embeds, prompt_attention_mask, negative_prompt_embeds, negative_pooled_prompt_embeds, negative_prompt_attention_mask, output_type, return_dict, attention_kwargs, callback_on_step_end, callback_on_step_end_tensor_inputs, prompt_template, max_sequence_length, image_embed_interleave)
911 elif image_condition_type == "token_replace":
912 latent_model_input = torch.cat([image_latents, latents[:, :, 1:]], dim=2).to(transformer_dtype)
--> 914 noise_pred = self.transformer(
915 hidden_states=latent_model_input,
916 timestep=timestep,
917 encoder_hidden_states=prompt_embeds,
918 encoder_attention_mask=prompt_attention_mask,
919 pooled_projections=pooled_prompt_embeds,
920 guidance=guidance,
921 attention_kwargs=attention_kwargs,
922 return_dict=False,
923 )[0]
925 if do_true_cfg:
926 neg_noise_pred = self.transformer(
927 hidden_states=latent_model_input,
928 timestep=timestep,
(...) 934 return_dict=False,
935 )[0]
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
1749 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1750 else:
-> 1751 return self._call_impl(*args, **kwargs)
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
1757 # If we don't have any hooks, we want to skip the rest of the logic in
1758 # this function, and just call forward.
1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1760 or _global_backward_pre_hooks or _global_backward_hooks
1761 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762 return forward_call(*args, **kwargs)
1764 result = None
1765 called_always_called_hooks = set()
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/diffusers/models/transformers/transformer_hunyuan_video.py:1109, in HunyuanVideoTransformer3DModel.forward(self, hidden_states, timestep, encoder_hidden_states, encoder_attention_mask, pooled_projections, guidance, attention_kwargs, return_dict)
1107 else:
1108 for block in self.transformer_blocks:
-> 1109 hidden_states, encoder_hidden_states = block(
1110 hidden_states,
1111 encoder_hidden_states,
1112 temb,
1113 attention_mask,
1114 image_rotary_emb,
1115 token_replace_emb,
1116 first_frame_num_tokens,
1117 )
1119 for block in self.single_transformer_blocks:
1120 hidden_states, encoder_hidden_states = block(
1121 hidden_states,
1122 encoder_hidden_states,
(...) 1127 first_frame_num_tokens,
1128 )
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
1749 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1750 else:
-> 1751 return self._call_impl(*args, **kwargs)
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
1757 # If we don't have any hooks, we want to skip the rest of the logic in
1758 # this function, and just call forward.
1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1760 or _global_backward_pre_hooks or _global_backward_hooks
1761 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762 return forward_call(*args, **kwargs)
1764 result = None
1765 called_always_called_hooks = set()
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/diffusers/models/transformers/transformer_hunyuan_video.py:789, in HunyuanVideoTokenReplaceTransformerBlock.forward(self, hidden_states, encoder_hidden_states, temb, attention_mask, freqs_cis, token_replace_emb, num_tokens)
784 norm_encoder_hidden_states, c_gate_msa, c_shift_mlp, c_scale_mlp, c_gate_mlp = self.norm1_context(
785 encoder_hidden_states, emb=temb
786 )
788 # 2. Joint attention
--> 789 attn_output, context_attn_output = self.attn(
790 hidden_states=norm_hidden_states,
791 encoder_hidden_states=norm_encoder_hidden_states,
792 attention_mask=attention_mask,
793 image_rotary_emb=freqs_cis,
794 )
796 # 3. Modulation and residual connection
797 hidden_states_zero = hidden_states[:, :num_tokens] + attn_output[:, :num_tokens] * tr_gate_msa.unsqueeze(1)
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
1749 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1750 else:
-> 1751 return self._call_impl(*args, **kwargs)
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
1757 # If we don't have any hooks, we want to skip the rest of the logic in
1758 # this function, and just call forward.
1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1760 or _global_backward_pre_hooks or _global_backward_hooks
1761 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762 return forward_call(*args, **kwargs)
1764 result = None
1765 called_always_called_hooks = set()
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/diffusers/models/attention_processor.py:605, in Attention.forward(self, hidden_states, encoder_hidden_states, attention_mask, **cross_attention_kwargs)
600 logger.warning(
601 f"cross_attention_kwargs {unused_kwargs} are not expected by {self.processor.__class__.__name__} and will be ignored."
602 )
603 cross_attention_kwargs = {k: w for k, w in cross_attention_kwargs.items() if k in attn_parameters}
--> 605 return self.processor(
606 self,
607 hidden_states,
608 encoder_hidden_states=encoder_hidden_states,
609 attention_mask=attention_mask,
610 **cross_attention_kwargs,
611 )
File /opt/anaconda3/envs/streem/lib/python3.11/site-packages/diffusers/models/transformers/transformer_hunyuan_video.py:120, in HunyuanVideoAttnProcessor2_0.__call__(self, attn, hidden_states, encoder_hidden_states, attention_mask, image_rotary_emb)
117 value = torch.cat([value, encoder_value], dim=2)
119 # 5. Attention
--> 120 hidden_states = F.scaled_dot_product_attention(
121 query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
122 )
123 hidden_states = hidden_states.transpose(1, 2).flatten(2, 3)
124 hidden_states = hidden_states.to(query.dtype)
RuntimeError: Invalid buffer size: 635.13 GB
ok this is a known issue with mps, same as https://github.com/huggingface/diffusers/issues/9972
Hm MLX is higher maintenance to generate random tensors. It's possible MPS requires extra boilerplate too to manage memory correctly. https://github.com/ml-explore/mlx/issues/2254#event-18086129505