ComfyUI-CogVideoXWrapper icon indicating copy to clipboard operation
ComfyUI-CogVideoXWrapper copied to clipboard

Not working on apple silicon (CogVideoX Fun Sampler Implementation)

Open defertoexpertise opened this issue 1 year ago • 17 comments

!!! Exception during processing !!! unsupported scalarType Traceback (most recent call last): File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs results.append(getattr(obj, func)(**inputs)) File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 519, in process autocast_context = torch.autocast(mm.get_autocast_device(device)) if autocastcondition else nullcontext() File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 229, in init dtype = torch.get_autocast_dtype(device_type) RuntimeError: unsupported scalarType

defertoexpertise avatar Sep 18 '24 15:09 defertoexpertise

I probably left the fp8 fast mode on, check that and put it to disabled to see if it resolves this. What GPU are you using?

kijai avatar Sep 18 '24 15:09 kijai

No it's disabled, i'm on a macbook, the issue seems to be that autocast isn't supported in any pytorch except nightly (As of a week ago) ... so that autocast to fp16 is breaking things... oddly when i went to nightly i started getting errors that in prompt_embeds=positive.to(dtype).to(device), positive is a .. list and doesn't have a .to on list

defertoexpertise avatar Sep 18 '24 15:09 defertoexpertise

prompt_embeds=positive.to(dtype).to(device), positive is a .. list and doesn't have a .to on list Are you using the example workflow?

kijai avatar Sep 18 '24 15:09 kijai

HAHA I had overlooked that CogVideo was using different text nodes than the stock ones, swapped to those, now that passes, however now seems to be breaking as it appears something is hardcoded to use cuda instead of failing back to mps or cpu if cudas not available .. haven't tracked down where yet...

i updated in the pipeline for pipeline_cogvideox.py where you had a hardcoded torch.device("cuda") to device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu") but doesn't seem to be the call thats got me hung, and whats odd is i can't find any other hardcoded references to cuda that would break things.

Traceback (most recent call last):
  File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 535, in process
    latents = pipe(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 634, in __call__
    self.vae.to(device)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 927, in _apply
    param_applied = fn(param)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1326, in convert
    return t.to(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 310, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled```

defertoexpertise avatar Sep 18 '24 16:09 defertoexpertise

Strange thing, in that inpainting file if i throw a print to see what device is before it tries to send the vae to a device... the device is set to device = self._execution_device... and then device if i print it is "cuda:0"....

defertoexpertise avatar Sep 18 '24 16:09 defertoexpertise

Ya i'm not sure where that _execution_device is getting set, even if i hard code that instance of it to "mps" or "cpu" ... it seems somehow it's used elsewhere and its still trying to force things onto cuda... which macs dont have

defertoexpertise avatar Sep 18 '24 16:09 defertoexpertise

Ya i'm not sure where that _execution_device is getting set, even if i hard code that instance of it to "mps" or "cpu" ... it seems somehow it's used elsewhere and its still trying to force things onto cuda... which macs dont have

I think it defaults to cuda if it can't find it from accelerate... dunno why that wouldn't work, you can try just forcing the execution device to mps though.

kijai avatar Sep 18 '24 16:09 kijai

if you mean trying to just self._execution_device = "mps" wont work its apparently not allowed.

AttributeError: can't set attribute '_execution_device'...

A bit of digging it seems that diffusers returns the device thats set in _hf_hook in the model... which is returning cuda:0

defertoexpertise avatar Sep 18 '24 17:09 defertoexpertise

Potentially found the reason: I wasn't calling the enable_model_cpu_offload with a device, so that would make it default to cuda.

kijai avatar Sep 18 '24 18:09 kijai

yep that solved that issue, so now with 2.6.0-dev pytorch (for the autocast to work in the pipeline)... it doesn't give the device error anymore.... Great catch, optional properties are so easy to overlook in these codebases

So close to it running lol i can feel it! XD

Now the hang is at ...

File "/Users/cc/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 64, in forward
    return super().forward(input)
  File "/Users/cc/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/cc/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
    return F.conv3d(
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

Get the feeling the dtype is being not passed somewhere it needs to be for float16

defertoexpertise avatar Sep 18 '24 18:09 defertoexpertise

Ya i'm not sure why it seems that conv3d is sometimes a float32... and the input is float16...

this is my setup btw

image

and heres the full trace

!!! Exception during processing !!! Input type (float) and bias type (c10::Half) should be the same
Traceback (most recent call last):
  File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 530, in process
    latents = pipe(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 719, in __call__
    _, masked_video_latents = self.prepare_mask_latents(
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 340, in prepare_mask_latents
    mask_pixel_values_bs = self.vae.encode(mask_pixel_values_bs)[0]
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 1120, in encode
    z_intermediate = self.encoder(z_intermediate)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 739, in forward
    hidden_states = down_block(hidden_states, temb, None)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 415, in forward
    hidden_states = resnet(hidden_states, temb, zq)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 297, in forward
    hidden_states = self.conv1(hidden_states)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 149, in forward
    output = self.conv(inputs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 69, in forward
    return super().forward(input)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
    return F.conv3d(
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

also i saw you mention 0.30.3 is required for diffusers it was on 0.30.2 but upgrading didn't change anything.

defertoexpertise avatar Sep 19 '24 19:09 defertoexpertise

Diffusers 0.30.3 is required for the official I2V model only, not the "Fun" variant.

Does that work for you btw, or is this only issue with the "Fun" models?

kijai avatar Sep 19 '24 19:09 kijai

cleared my folder and pulled latest from git repo ... and tested with the 2b models with the respective sampler... got very similar errors but... slightly different

with standard 2b (first in list)...

!!! Exception during processing !!! Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.
Traceback (most recent call last):
  File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 455, in process
    latents = pipeline["pipe"](
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 607, in __call__
    noise_pred = self.transformer(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 456, in forward
    hidden_states, encoder_hidden_states = block(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 131, in forward
    attn_hidden_states, attn_encoder_hidden_states = self.attn1(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 490, in forward
    return self.processor(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1925, in __call__
    hidden_states = F.scaled_dot_product_attention(
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.

with fun 2b...

!!! Exception during processing !!! Input type (float) and bias type (c10::Half) should be the same
Traceback (most recent call last):
  File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 641, in process
    latents = pipe(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 718, in __call__
    _, masked_video_latents = self.prepare_mask_latents(
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 339, in prepare_mask_latents
    mask_pixel_values_bs = self.vae.encode(mask_pixel_values_bs)[0]
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 1114, in encode
    z_intermediate = self.encoder(z_intermediate)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 733, in forward
    hidden_states = down_block(hidden_states, temb, None)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 409, in forward
    hidden_states = resnet(hidden_states, temb, zq)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 291, in forward
    hidden_states = self.conv1(hidden_states)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 144, in forward
    output = self.conv(inputs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 64, in forward
    return super().forward(input)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
    return F.conv3d(
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

defertoexpertise avatar Sep 19 '24 22:09 defertoexpertise

I tried to resolve the above - running the 5B I2V model - it seems to be a deeper issue within the CogVideo diffuser model or in the MPS implementation of pytorch (though I can't be sure). I am leaving these details here, in case someone picks this up:

  1. The precision sent from the codebase in this repository seems to be working correctly (I was running at float32 precision and all the tensors sent to the underlying model had the same precision)
  2. After the following line of code in the diffuser library: https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py#L1924 the query and key tensors have dtype of float32 whereas the value tensor had a dtype of float16 (which seems to be an issue reading the above)

At point 2, I tried forcing the precision as float32 for all tensors and also forcing them to float16 before the call to: scaled_dot_product_attention. In both cases, my macbook gave an OOO error (I have a 36 GB RAM model).

Might try to set this up on a GPU instance somewhere using an Nvidia card ¯_(ツ)_/¯

digvijay7 avatar Sep 21 '24 10:09 digvijay7

Well the float32 precision will likely oomw without any bugs or issues, bf16 they show at 16gb (confirmed as it can oom even on T4 colab) they even mention in the colabs that they can oom on 16gb vram and memory, I imagine some of this in this comfy extension is the tensor shuffling around chewing up memory but definitly think it needs to run in fp16 to have a chance of running locally on a 36gb…

keep in mind on Mac’s offloading doesn’t do anything as it’s unified vram/ram we’d have to swap to completely unloading extraneous stuff not just shifting it to cpu

cchance27 avatar Sep 21 '24 12:09 cchance27

I have a similar issue running the 5B I2V model on MacBookPro M3 Max (128 RAM, Sonoma latest). Python 3.12.4 (miniconda3), pytorch 2.6.0.dev20240924 This happens regardless of using flags --force-fp16, --force-fp32, --dont-upcast-attention.

RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.

Let me know if you'd prefer I open another issue or run a few tests given this machine's memory. I've also tried brute forcing types as @digvijay7 mentioned above, to no avail. Any insights are welcome, thanks!

The full error output is:

** Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 10:07:17) [Clang 14.0.6 ]
** Python executable: /Users/u/miniconda3/bin/python
** ComfyUI Path: /Users/u/ComfyUI
** Log path: /Users/u/ComfyUI/comfyui.log

Prestartup times for custom nodes:
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/rgthree-comfy
   0.3 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-Manager

Total VRAM 131072 MB, total RAM 131072 MB
pytorch version: 2.6.0.dev20240924
Forcing FP16.
Set vram state to: SHARED
Device: mps
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
[Prompt Server] web root: /Users/u/ComfyUI/web
/Users/u/miniconda3/lib/python3.12/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
### Loading: ComfyUI-Manager (V2.51.1)
### ComfyUI Revision: 2727 [fdf37566] | Released on '2024-09-24'

[rgthree] Loaded 42 exciting nodes.
[rgthree] NOTE: Will NOT use rgthree's optimized recursive execution as ComfyUI has changed.

Total VRAM 131072 MB, total RAM 131072 MB
pytorch version: 2.6.0.dev20240924
Forcing FP16.
Set vram state to: SHARED
Device: mps
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json

Import times for custom nodes:
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/websocket_image_save.py
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/rgthree-comfy
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-KJNodes
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-Manager
   0.1 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper
   0.2 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Encoded latents shape: torch.Size([1, 1, 16, 60, 90])
/Users/u/miniconda3/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Requested to load SD3ClipModel_
Loading 1 new model
loaded completely 0.0 4541.693359375 True
Temporal tiling disabled
  0%|          | 0/50 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
  0%|          | 0/50 [00:00<?, ?it/s]
!!! Exception during processing !!! Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.
Traceback (most recent call last):
  File "/Users/u/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/u/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 843, in process
    latents = pipeline["pipe"](
              ^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 615, in __call__
    noise_pred = self.transformer(
                 ^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 456, in forward
    hidden_states, encoder_hidden_states = block(
                                           ^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 131, in forward
    attn_hidden_states, attn_encoder_hidden_states = self.attn1(
                                                     ^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/diffusers/models/attention_processor.py", line 490, in forward
    return self.processor(
           ^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/diffusers/models/attention_processor.py", line 1925, in __call__
    hidden_states = F.scaled_dot_product_attention(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.

trogloditee avatar Sep 27 '24 16:09 trogloditee

Can confirm this is an issue on M2 Max chips. https://github.com/pytorch/pytorch/issues/110285 just gonna leave this here.

BenRacicot avatar Oct 05 '24 15:10 BenRacicot