ComfyUI FLUX Issue | MPS framework doesn't support float64

Expected Behavior

Run the inference

Actual Behavior

After 273.31 seconds, it throws an exception

Steps to Reproduce

Upload the example workflow for DEV version https://comfyanonymous.github.io/ComfyUI_examples/flux/

Debug Logs

!!! Exception during processing!!! Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
Traceback (most recent call last):
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy_extras/nodes_custom_sampler.py", line 612, in sample
    samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 716, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 695, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 600, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/k_diffusion/sampling.py", line 143, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 299, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 682, in __call__
    return self.predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 685, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 279, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/custom_nodes/ComfyUI-TiledDiffusion/.patches.py", line 4, in calc_cond_batch
    return calc_cond_batch_original_tiled_diffusion_91e66834(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 228, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py", line 64, in apply_model_uncond_cleanup_wrapper
    return orig_apply_model(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/model_base.py", line 121, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/model.py", line 135, in forward
    out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/model.py", line 112, in forward_orig
    pe = self.pe_embedder(ids)
         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/layers.py", line 21, in forward
    [rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/math.py", line 16, in rope
    scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Other

No response

Aug 01 '24 16:08 alexgenovese

https://github.com/comfyanonymous/ComfyUI/commit/48eb1399c02bdae7e14b2208c448b69b382d0090

can you check if this fixes it.

Aug 01 '24 17:08 comfyanonymous

On Mac it seems to run with default settings, but just gets a black image output. If I change it fp8 as mentioned above then Mac says MPS doesn't support that.

Aug 01 '24 20:08 mhale1

On Mac it seems to run with default settings, but just gets a black image output. If I change it fp8 as mentioned above then Mac says MPS doesn't support that.

How much RAM do you have? For some reason both original and fp8 models are taking around 40+gb. Is it the same for you?

Aug 01 '24 23:08 tombearx

@tombearx I have a 64 GB M1 Mac and a 16 GB 3080 on my Windows machine. Use the Mac more at work so was trying there first.

Aug 02 '24 03:08 mhale1

It probably won't help fix it. But when I enable preview, I can see that as the image is generating, it is adding new stripes to the top of the image and the actual image may be shifting down by a corresponding amount.

I also am running on an M3 Max with 128GB ram. Flux won't run at 8-bit at all, comfy gives an error. The T5 model runs at 8 or 16 but that doesn't help with this issue. I updated pytorch to the current daily build of 2.5.0 wich also did not help.

Aug 02 '24 07:08 ghogan42

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation. Here is what I get with the unmodified example workflow on a 64GB M1 Max with torch 2.3.1, using the latest ComfyUI commit as of this post and the Flux Dev model (with the fp16 T5 text encoder, t5xxl_fp16.safetensors): ComfyUI_00104_

Aug 02 '24 09:08 brkirch

this workflow is working on my m3/128 https://civitai.com/models/617060/comfyui-workflow-for-flux-simple

Aug 02 '24 11:08 twalderman

OK guys i pruned the weights, theyre now 11GB and no quality loss, it loads up faster, takes way less space in VRAM... Not sure why they were not relased pruned this way. They are loaded in 8bit still tho, i believe should be in 16, can fp16 be enabled in loader as well ? Cause when i tried to add fp16 on my own, i think it loaded as default and generation was very slow... compared to 8.

class UNETLoader: @classmethod def INPUT_TYPES(s): return {"required": { "unet_name": (folder_paths.get_filename_list("unet"), ), "weight_dtype": (["default", "fp16", "fp8_e4m3fn", "fp8_e5m2"],) }} RETURN_TYPES = ("MODEL",) FUNCTION = "load_unet"
CATEGORY = "advanced/loaders"

def load_unet(self, unet_name, weight_dtype):
    weight_dtype = {"default": None, 
                    "fp16": torch.float16,
                    "fp8_e4m3fn": torch.float8_e4m3fn, 
                    "fp8_e5m2": torch.float8_e4m3fn}[weight_dtype]
    unet_path = folder_paths.get_full_path("unet", unet_name)
    model = comfy.sd.load_unet(unet_path, dtype=weight_dtype)
    return (model,)

Can you explain how to prune it? Where to add? Sorry if it is noob question.

** Sorry, I git pulled and checked the code. All clear. Thanks!

Aug 02 '24 12:08 QueryType

Well, first shot did not work. I am on torch 2.3.1, Mac M2, 24GB. I loaded the schnell, fp8_e4m3fn. As is seen it does not use MPS and triggered a 5GB swap. I think I will wait for fixes to flow in.

Requested to load Flux Loading 1 new model python(4803) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 0%| | 0/4 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) 0%| | 0/4 [00:04<?, ?it/s] !!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype. Traceback (most recent call last): File "/Volumes/d/apps/sdxl/comfi/ComfyUI/execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 612, in sample samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 716, in sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 695, in inner_sample samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 600, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/utils/contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/k_diffusion/sampling.py", line 143, in sample_euler denoised = model(x, sigma_hat * s_in, **extra_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 299, in call out = self.inner_model(x, sigma, model_options=model_options, seed=seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 682, in call return self.predict_noise(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 685, in predict_noise return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 279, in sampling_function out = calc_cond_batch(model, conds, x, timestep, model_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 228, in calc_cond_batch output = model.apply_model(input_x, timestep, **c).chunk(batch_chunks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/model_base.py", line 121, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ldm/flux/model.py", line 143, in forward out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ldm/flux/model.py", line 101, in forward_orig img = self.img_in(img) ^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 63, in forward return self.forward_comfy_cast_weights(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 58, in forward_comfy_cast_weights weight, bias = cast_bias_weight(self, input) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 39, in cast_bias_weight bias = cast_to(s.bias, dtype, device, non_blocking=non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 24, in cast_to return weight.to(device=device, dtype=dtype, non_blocking=non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

Prompt executed in 218.28 seconds`

Aug 02 '24 13:08 QueryType

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation.

Yep. This is the way. Downgrading to these versions fixes generation for me on my m3 max based macbook.

Aug 02 '24 17:08 ghogan42

~~Still no luck yet on my M1 Max even after the torch downgrades.~~ I take that back. Just pulled latest from this morning (just the clip_l encoder change?), and that combined with the earlier torch downgrade did fix it.

Aug 02 '24 17:08 mhale1

the latest MPS nightly is working for me.

Aug 02 '24 22:08 twalderman

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Aug 03 '24 06:08 QueryType

the latest MPS nightly is working for me.

Nightly is still broken for me. 2.3 downgrade works.

$RefractAI avatar$ Aug 03 '24 11:08 RefractAI

I tried latest nightly. It "works" when using the normal cfgguider node, but is extremely blurry. Using basic guider + flux guidance node leads to noise. ComfyUI_00002_ ComfyUI_00004_ ComfyUI_00005_

[Edit] Confirmed that downgraded torch does work, though you need basic guider + flux guidance node. Cfgguider node still produces blurry output. ComfyUI_00013_ ComfyUI_00014_ ComfyUI_00015_ ComfyUI_00016_ (Image pairs differ in scheduler between euler and bosh3 (custom ODE scheduler).)

Aug 03 '24 12:08 Adreitz

Has anyone seen value to the new guider for flux? If so I will downgrade to try it. With the nightly I'm getting nice output with guidance of 1.

Aug 03 '24 18:08 twalderman

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

Aug 03 '24 20:08 tombearx

@twalderman ~~I just tested and there might be something wrong with the guidance. I'm not seeing any difference between scale 1.0 and scale 4.5. Literally zero, when I subtract one image from the other.~~ Nevermind, comfy messed up somehow. How exactly did you get things working with torch nightlies?

Aug 03 '24 21:08 Adreitz

I didnt do anything unusual. I tested with the nightly and had no issues so I didnt revert back again. I have been generating images all day.

Aug 03 '24 23:08 twalderman

@twalderman Weird. What OS version are you using?

Here is an example of the differences you could expect from changing the guidance scale (1.0 - 4.0 in steps of 0.5; 4.5 is above; all using bosh3 sampler).

ComfyUI_00021_ ComfyUI_00023_ ComfyUI_00025_ ComfyUI_00027_ ComfyUI_00029_ ComfyUI_00031_ ComfyUI_00033_

Aug 03 '24 23:08 Adreitz

can you share your workflow? On my Max M1 it runs for 10 min and the pic is noisy.

Aug 04 '24 02:08 RainbowBull

can you share your workflow? On my Max M1 it runs for 10 min and the pic is noisy.

I used workflow from previous picture. I have around 90-100s/it probably because bf16 is not supported directly and model using much more ram (and swap) than it should.

Aug 04 '24 03:08 tombearx

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

It is a bit of a bad situation for us. I am at 24G cannot even dream.

Aug 04 '24 12:08 QueryType

@Adreitz i am using the latest sequoia beta.

Aug 04 '24 14:08 twalderman

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

It is a bit of a bad situation for us. I am at 24G cannot even dream.

Looks like RAM issue arise due to the fact that text encoders hasn't unloaded from RAM on MPS. I opened the issue: https://github.com/comfyanonymous/ComfyUI/issues/4201

Aug 04 '24 14:08 tombearx

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation. Here is what I get with the unmodified example workflow on a 64GB M1 Max with torch 2.3.1, using the latest ComfyUI commit as of this post and the Flux Dev model (with the fp16 T5 text encoder, t5xxl_fp16.safetensors):

perfect solution !

Aug 04 '24 19:08 dreamrec

how long does it takes to generate 1 image? Mine takes 10 min

Aug 04 '24 19:08 RainbowBull

how long does it takes to generate 1 image? Mine takes 10 min

M3 max 64gb takes 210s (1024x1024 30 steps)

Aug 04 '24 23:08 dreamrec

how long does it takes to generate 1 image? Mine takes 10 min

Just under 5 min on M2 Max 64gb at 1024 @ 20 steps

Aug 05 '24 19:08 timothyallan

i have m1 max 32gb but it still takes 10 min

@dreamrec can you share the workflow again? its private so i cant see it.

Aug 05 '24 20:08 RainbowBull