ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype. SD3.5 FP8 Mac M2

Open Creative-comfyUI opened this issue 1 year ago • 19 comments

Expected Behavior

To render the image

Actual Behavior

no image it bugs

Steps to Reproduce

Using the ComfyUI workflow present in wiki page

Debug Logs

I can not put all the Debug log it is too long 

2024-11-07 23:08:51,027 - root - INFO - Total VRAM 16384 MB, total RAM 16384 MB
2024-11-07 23:08:51,027 - root - INFO - pytorch version: 2.4.1
2024-11-07 23:08:51,027 - root - INFO - Set vram state to: SHARED
2024-11-07 23:08:51,027 - root - INFO - Device: mps
2024-11-07 23:08:51,707 - root - INFO - Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
2024-11-07 23:08:52,888 - root - INFO - [Prompt Server] web root: /Volumes/mac_disk/AI/ComfyUI/web
2024-11-07 23:08:53,559 - root - WARNING - Traceback (most recent call last):
  File "/Volumes/mac_disk/AI/ComfyUI/nodes.py", line 2012, in load_custom_node
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 990, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1127, in get_code
  File "<frozen importlib._bootstrap_external>", line 1185, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/Volumes/mac_disk/AI/ComfyUI/custom_nodes/clipseg/__init__.py'

2024-11-07 23:08:53,560 - root - WARNING - Cannot import /Volumes/mac_disk/AI/ComfyUI/custom_nodes/clipseg module for custom nodes: [Errno 2] No such file or directory: '/Volumes/mac_disk/AI/ComfyUI/custom_nodes/clipseg/__init__.py'
2024-11-07 23:08:55,756 - root - INFO - Total VRAM 16384 MB, total RAM 16384 MB
2024-11-07 23:08:55,756 - root - INFO - pytorch version: 2.4.1
2024-11-07 23:08:55,756 - root - INFO - Set vram state to: SHARED
2024-11-07 23:08:55,756 - root - INFO - Device: mps
2024-11-07 23:09:05,211 - root - INFO - 
Import times for custom nodes:

2024-11-07 23:09:05,218 - root - INFO - Starting server

2024-11-07 23:09:05,218 - root - INFO - To see the GUI go to: http://127.0.0.1:8188
2024-11-07 23:09:26,847 - root - INFO - got prompt
2024-11-07 23:09:27,368 - root - INFO - Using split attention in VAE
2024-11-07 23:09:27,369 - root - INFO - Using split attention in VAE
2024-11-07 23:09:33,750 - root - INFO - model weight dtype torch.bfloat16, manual cast: None
2024-11-07 23:09:33,754 - root - INFO - model_type FLOW
2024-11-07 23:09:40,701 - root - INFO - Requested to load FluxClipModel_
2024-11-07 23:09:40,702 - root - INFO - Loading 1 new model
2024-11-07 23:09:40,705 - root - INFO - loaded completely 0.0 323.94775390625 True
2024-11-07 23:09:42,747 - root - INFO - Requested to load FluxClipModel_
2024-11-07 23:09:42,747 - root - INFO - Loading 1 new model
2024-11-07 23:11:19,126 - root - INFO - Requested to load Flux
2024-11-07 23:11:19,127 - root - INFO - Loading 1 new model
2024-11-07 23:12:45,079 - root - INFO - loaded completely 0.0 7880.297119140625 True
2024-11-07 23:17:57,484 - root - INFO - got prompt
2024-11-07 23:20:42,637 - root - INFO - Processing interrupted
2024-11-07 23:20:42,647 - root - INFO - Prompt executed in 675.79 seconds
2024-11-07 23:22:15,177 - root - INFO - Unloading models for lowram load.
2024-11-07 23:22:43,277 - root - INFO - 1 models unloaded.
2024-11-07 23:22:43,401 - root - INFO - Loading 1 new model
2024-11-07 23:22:59,452 - root - INFO - loaded completely 0.0 7880.297119140625 True
2024-11-07 23:35:20,213 - root - INFO - Requested to load AutoencodingEngine
2024-11-07 23:35:20,225 - root - INFO - Loading 1 new model
2024-11-07 23:35:58,888 - root - INFO - loaded completely 0.0 319.7467155456543 True
2024-11-07 23:36:10,025 - root - INFO - Prompt executed in 924.68 seconds
2024-11-07 23:44:30,055 - root - INFO - got prompt
2024-11-07 23:45:55,950 - root - INFO - Unloading models for lowram load.
2024-11-07 23:45:56,125 - root - INFO - 1 models unloaded.
2024-11-07 23:45:56,125 - root - INFO - Loading 1 new model
2024-11-07 23:46:13,772 - root - INFO - loaded completely 0.0 7880.297119140625 True
2024-11-08 00:14:36,066 - root - INFO - Requested to load AutoencodingEngine
2024-11-08 00:14:36,070 - root - INFO - Loading 1 new model
2024-11-08 00:15:31,253 - root - INFO - loaded completely 0.0 319.7467155456543 True
2024-11-08 00:16:16,545 - root - INFO - Prompt executed in 1906.03 seconds
2024-11-08 00:20:47,717 - root - INFO - got prompt
2024-11-08 00:22:16,020 - root - INFO - Unloading models for lowram load.
2024-11-08 00:22:16,106 - root - INFO - 1 models unloaded.
2024-11-08 00:22:16,106 - root - INFO - Loading 1 new model
2024-11-08 00:22:38,298 - root - INFO - loaded completely 0.0 7880.297119140625 True
2024-11-08 00:22:53,124 - root - INFO - got prompt
2024-11-08 00:22:53,547 - root - ERROR - Failed to validate prompt for output 9:
2024-11-08 00:22:53,547 - root - ERROR - * CheckpointLoaderSimple 4:
2024-11-08 00:22:53,547 - root - ERROR -   - Value not in list: ckpt_name: 'sd3.5_large_fp8_scaled.safetensors' not in (list of length 82)
2024-11-08 00:22:53,547 - root - ERROR - Output will be ignored
2024-11-08 00:22:53,547 - root - WARNING - invalid prompt: {'type': 'prompt_outputs_failed_validation', 'message': 'Prompt outputs failed validation', 'details': '', 'extra_info': {}}
2024-11-08 00:23:41,232 - root - INFO - got prompt
2024-11-08 00:45:14,511 - root - INFO - Requested to load AutoencodingEngine
2024-11-08 00:45:14,519 - root - INFO - Loading 1 new model
2024-11-08 00:46:03,350 - root - INFO - loaded completely 0.0 319.7467155456543 True
2024-11-08 00:46:40,530 - root - INFO - Prompt executed in 1552.44 seconds
2024-11-08 00:46:53,921 - root - INFO - model weight dtype torch.float16, manual cast: None
2024-11-08 00:46:53,927 - root - INFO - model_type FLOW
2024-11-08 00:49:06,020 - root - INFO - Using split attention in VAE
2024-11-08 00:49:06,026 - root - INFO - Using split attention in VAE
2024-11-08 00:49:10,767 - root - INFO - Requested to load SD3ClipModel_
2024-11-08 00:49:10,767 - root - INFO - Loading 1 new model
2024-11-08 00:49:10,775 - root - INFO - loaded completely 0.0 6228.190093994141 True
2024-11-08 00:51:12,123 - root - INFO - Requested to load SD3ClipModel_
2024-11-08 00:51:12,123 - root - INFO - Loading 1 new model
2024-11-08 00:51:12,131 - root - INFO - loaded completely 0.0 6102.49609375 True
2024-11-08 00:51:20,350 - root - WARNING - clip missing: ['text_projection.weight']
2024-11-08 00:54:36,959 - root - INFO - Requested to load SD3
2024-11-08 00:54:37,017 - root - INFO - Loading 1 new model
2024-11-08 00:55:28,542 - root - INFO - loaded completely 0.0 7683.561706542969 True
2024-11-08 00:55:30,326 - root - ERROR - !!! Exception during processing !!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.
2024-11-08 00:55:30,608 - root - ERROR - Traceback (most recent call last):
  File "/Volumes/mac_disk/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

Other

No response

Creative-comfyUI avatar Nov 08 '24 06:11 Creative-comfyUI

Why do I have a feeling that nobody wants to solve this problem? A million messages on the internet about this error and nobody solves anything(.

Djon253 avatar Nov 10 '24 19:11 Djon253

Why do I have a feeling that nobody wants to solve this problem? A million messages on the internet about this error and nobody solves anything(.

I have maybe an idea for some reason it seems on mac you can use only gguf model and not the native model for sd3.5 and flux.

Creative-comfyUI avatar Nov 10 '24 22:11 Creative-comfyUI

Same issue. Native Flux model. Will try with GGUF

igor-elbert avatar Nov 11 '24 19:11 igor-elbert

Good news Now I found a way to play SD3.5 on mac without this problem Write this for starting the server PYTORCH_ENABLE_MPS_FALLBACK=1 python3 main.py

For sure it is slow very slow but it works

Creative-comfyUI avatar Nov 11 '24 19:11 Creative-comfyUI

same error, when use flux1-dev-fp8 with fp8_e4m3fn, but if change the weight dtype to "default"(fp16) it can works

michaeldlutlee1 avatar Dec 04 '24 13:12 michaeldlutlee1

Same error with the Flux Schnell Sample Workflow in the Mac Desktop App version 0.3.28 arm64.

ekt1701 avatar Dec 06 '24 01:12 ekt1701

Same error. And the above solutions are not working for me

pneumaticG avatar Dec 10 '24 22:12 pneumaticG

Same error. And the above solutions are not working for me

Not using the SD3 model at the moment, but had the same issue initally when pulling from an existing workflow for pixelwave flux model , but changed the dtype to default, and w/ the dual clip mixed. Screenshot 2024-12-13 at 3 46 48 PM

santawash avatar Dec 13 '24 21:12 santawash

same error, when use flux1-dev-fp8 with fp8_e4m3fn, but if change the weight dtype to "default"(fp16) it can works

This worked for me. Thank you!

bigeye-studios avatar Jan 08 '25 14:01 bigeye-studios

同样的错误,当使用 flux1-dev-fp8 与 fp8_e4m3fn 时,但如果将权重 dtype 更改为“default”(fp1​​6)它可以工作

的确是如此,但是Apple M1 Max/64GB配置,跑一个不算复杂的工作流,直接黑屏重启!而且很慢很慢!如果没有其他有效的方法,可能在mac上就彻底放弃flux了。

yujianzhong3129 avatar Jan 13 '25 13:01 yujianzhong3129

I reviewed the complaints wrt float8 errors on MPS. I found it is a relatively easy problem to fix. What I did was convert all the problematic data types found in a large safetensors file, to datatypes that are more palatable to MPS. i.e. float8 or bfloat16 they like to use.

I am thinking. Now that I have my convert2MPS working, why can't I put it into a Comfy custom node? Have the conversion done when loading the model. I have other issues with Comfy but if I can get past them, then maybe I'll give it a try.

gcr-cormac avatar Jan 22 '25 20:01 gcr-cormac

Hi gcr-cormac,

I saw your solution to the “Trying to convert Float8_e4m3fn to the MPS backend” error in ComfyUI. I’m using an M1 MacBook Pro, but I’m not very familiar with programming or coding.

Could you please provide step-by-step instructions on how to fix this issue on macOS? I would really appreciate a simple explanation, including any commands I need to run in the Terminal.

Thank you so much for your help! 😊

kidbbc avatar Feb 17 '25 21:02 kidbbc

@gcr-cormac have you tried it , and was it the best option ?

cc : @kidbbc

saurabhthesuperhero avatar Feb 18 '25 16:02 saurabhthesuperhero

same error, when use flux1-dev-fp8 with fp8_e4m3fn, but if change the weight dtype to "default"(fp16) it can works

This worked for me. Thank you!

how to change?

learnwithexamples avatar Feb 20 '25 02:02 learnwithexamples

same error, when use flux1-dev-fp8 with fp8_e4m3fn, but if change the weight dtype to "default"(fp16) it can works同样的错误,当使用Flux1-dev-fp8与fp8_e4m3fn,但如果更改权重dtype为默认(fp16)它可以工作

Thank you.It works for me.

xzh0315 avatar Feb 20 '25 07:02 xzh0315

@yujianzhong3129 我分别在MacBook Pro M1 Max 32GB和Mac Studio M2 ultra 192GB上跑了一下,发现生成纯黑色图片的原因是爆显存了,你试一试GGUF量化版本的FLUX模型.

总结一下:如果出现FP8_e4m3fn to the MPS backend but it does not have support for that type报错,这是因为原生的FLUX和SD模型对Mac的M芯片支持不好,将FLUX1模型的权重dtype改为fp16可以解决这个问题(因为目前只有Nvidia最新的显卡支持FP8精度),但是这样做会增加显存占用,可能会导致生成纯黑色图片。如果显存爆了换量化了的更小的模型。

Her-77 avatar Feb 25 '25 02:02 Her-77

I got this issue on a M2 MacBook air with 24GB of ram. Here is how I solved it: Get a gguf model, like: https://civitai.com/models/647237?modelVersionId=725532 I am using 4.1 comfyui uses about 6GB of ram, down from 40+, each image takes about 40 minutes to generate. Put the model file in the unet folder. I used comfyui manager to add several modules the used the gguf keyword, one of them must have worked.
Use the gguf loader to load the model: nodes, gguf, gguf loader also use the gguf dual clip loader. I read somewhere that the F8 models need M3 or later. T5/t5xxl_fp16.safetensors worked for me.

also: To get comfy to work, after running installer, cd to the directory then run: python -m pip install -r requirements.txt

Per the instructions I had already run: curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh sh Miniconda3-latest-MacOSX-arm64.sh -u conda install pytorch torchvision torchaudio -c pytorch-nightly pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

mikegrok avatar Mar 03 '25 03:03 mikegrok

Should we run the whole code: curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh sh Miniconda3-latest-MacOSX-arm64.sh -u conda install pytorch torchvision torchaudio -c pytorch-nightly pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

mohalam avatar Mar 10 '25 19:03 mohalam

same error, when use flux1-dev-fp8 with fp8_e4m3fn, but if change the weight dtype to "default"(fp16) it can works

thanks it works well for me :)

taehyeonEum avatar Apr 23 '25 08:04 taehyeonEum

I got this issue on a M2 MacBook air with 24GB of ram. Here is how I solved it: Get a gguf model, like: https://civitai.com/models/647237?modelVersionId=725532 I am using 4.1 comfyui uses about 6GB of ram, down from 40+, each image takes about 40 minutes to generate. Put the model file in the unet folder. I used comfyui manager to add several modules the used the gguf keyword, one of them must have worked. Use the gguf loader to load the model: nodes, gguf, gguf loader also use the gguf dual clip loader. I read somewhere that the F8 models need M3 or later. T5/t5xxl_fp16.safetensors worked for me.

also: To get comfy to work, after running installer, cd to the directory then run: python -m pip install -r requirements.txt

Per the instructions I had already run: curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh sh Miniconda3-latest-MacOSX-arm64.sh -u conda install pytorch torchvision torchaudio -c pytorch-nightly pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

I have an m4 and I am running into the same issue

jrgleason avatar Jun 02 '25 00:06 jrgleason

As all I known, the root cause is MPS incompatible with Float8_e4m3fn

On my M1, I run Comfy with the command python main.py --force-fp16 --use-split-cross-attention --cpu, and it works well for me(But you should know, it runs very very very slowly). here is reference blog: Solution

BiyuHuang avatar Jul 04 '25 10:07 BiyuHuang

My workaround on Mac M2:

  • download a "GGUF" conversion of the model
  • put the model's .gguf file in directory ComfyUI/models/unet
  • replace the "Load Diffusion Model" node with "Unet Loader (GGUF)"

Then the workflow can run smoothly on GPU.

h12w avatar Jul 15 '25 10:07 h12w

Hi @h12w, I've got a M3 Mac. I'm new to ComfyUI, trying to get V2V running with the Hunyuan video wrapper and facing the fp8 issue.

I'm trying your method using GGUF files but need a little (ok lots of) help with hooking things up. If you have a json file to share, that would be great.

Any links shared is much appreciated. :)

peterlchung avatar Jul 17 '25 13:07 peterlchung

https://studywarehouse.com/solution-trying-to-convert-float8_e4m3fn-to-the-mps-backend-but-it-does-not-have-support-for-that-dtype/

nsuedu avatar Aug 11 '25 15:08 nsuedu

https://github.com/comfyanonymous/ComfyUI/issues/6995#issuecomment-3024875418 this solution worked for me

absolutaget avatar Sep 01 '25 10:09 absolutaget