stable-diffusion-webui-forge Get Flux working on Apple Silicon

Should fix #1103

Aug 18 '24 09:08 conornash

we would love to see Flux work on Silicon Mac

Aug 19 '24 19:08 vicento

Yup waiting impatiently to try Q versions or maybe the NF4

Aug 22 '24 07:08 l0stl0rd

Should fix #1103

Also should it not be possible to set it to bfloat16 too or set it as an option, if I remember right starting PyTorch 2.3.0 ist is supported on MPS. I did try it once in Invoke.

Ok apparently bfloat16 only works on M2 or newer, still would be nice to have it ;)

Aug 22 '24 11:08 l0stl0rd

Should fix #1103

Also should it not be possible to set it to bfloat16 too or set it as an option, if I remember right starting PyTorch 2.3.0 ist is supported on MPS. I did try it once in Invoke.

Ok apparently bfloat16 only works on M2 or newer, still would be nice to have it ;)

Tried using bfloat16 on my M3, but got the following error:

RuntimeError: "arange_mps" not implemented for 'BFloat16'

Sep 13 '24 13:09 conornash

@DenOfEquity is there anything I should do to get this approved?

Sep 13 '24 13:09 conornash

I guess none of the active collaborators/maintainers can actually test what's going on with MPS. Also, some confusion in the linked issue about it working or not, or only with some models. But it doesn't/can't break anything, so if it helps at least sometimes I'm calling it progress. I'm also curious if it can't be float32 all the time - works for me, 100% identical results, sample size of 1.

Sep 13 '24 14:09 DenOfEquity

Models I've tested: Schnell,Dev and GUFF all work ok with this "fix". Arguably the difference is GUFF is same as Dev but with less VRAM usage and Schnell is just different, resolves with less steps but looks like another thing: Screenshot 2024-09-13 at 23 04 16

NF4 wont work for obvious reasons (Bits and Bytes not being ported to Mac)

Sep 13 '24 22:09 MigCurto

for me it seems none of the FP8 checkpoints works, I get: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

Even if I select the Fp16 T5.

The GGUF version I tried did not work either, error: Unsupported type byte size: UInt16

The full FP16 Flux works but it is horribly slow on my M3 Pro, about 6 min for 20 steps.

Also not sure why bfloat16 does not work, even with pytorch 2.4.1. or nightly.

Upadate: or mybe it does work but the code needs to be different.

Sep 14 '24 12:09 l0stl0rd

actually if I do this: if pos.device.type == "mps": scale = torch.arange(0, dim, 2, dtype=torch.float16, device=pos.device) / dim

and use torch nightly, need to recheck 2.4.1., then bfloat16 works it seems as I get this. K-Model Created: {'storage_dtype': torch.bfloat16, 'computation_dtype': torch.bfloat16}

took 5:40 min on Torch 2.6 nightly

works with pytorch 2.4.1. too

However it does not matter really as the FP8 and de Q4_1 gguf still give the same error.

Sep 14 '24 13:09 l0stl0rd

Mac M2 +1

Feb 15 '25 14:02 ArcMichael