Alex Redden comments

Results 82 comments of


                                            Alex Redden

Where is the code about "remaining layers use faster half precision accumulate"?

1. You can optionally quantize the others by setting `"quantize_flow_embedder_layers": true`, but it does pretty considerably reduce quality and doesn't add much extra vram or increase it/s. The non-single-or-double-block layers...

Where is the code about "remaining layers use faster half precision accumulate"?

Not really- it has enough sram where it gets the same tflops for fp16 w/ fp32 accumulate as it does for fp16 w/ fp16 accumulate. @spejamas

Compatibility Inquiry: Using flux-fp8 with OpenFLUX.1

I'm sure it would work with tinkering, not sure how different the architecture is, but if it's similar to normal flux then it will work. If it's more of a...

Potential LoRA performance issue

Ah that's interesting. Are you using the latest code? There was a bug earlier where it was always setting the lora alpha to 1.0 for huggingface diffusers loras. Though it...

Potential LoRA performance issue

It's possible that there are some lora loading specifics that I didn't implement well- but I'm not entire sure what that would be. I will have to look into other...

Potential LoRA performance issue

If you're getting black images I would recommend setting flow_dtype to bfloat16, it should help a bit. I'm still a bit unsure as to how I am supposed to handle...

Potential LoRA performance issue

Thanks 😄 - well if you find anywhere in my lora loading implementation here https://github.com/aredden/flux-fp8-api/blob/main/lora_loading.py let me know and I'll change it, or you can submit a pull request and...

Why is vae decoder so slow? Can you help me?

Seems like most of the delay is actually synchronization, which sort of implies that the slowdown is actually something else in the code prior. The way torch works is each...

Why is vae decoder so slow? Can you help me?

Actulally, it could be because you may have set autoencoder offloading to true, so- in that case it could be that the slowdown is moving the vae to gpu, encoding,...

Why is vae decoder so slow? Can you help me?

I'm not entirely sure what the slowdown would be- though an L4 has pretty low wattage limits so it might be related it throttling because of wattage limits. I would...