stable-diffusion-webui-forge icon indicating copy to clipboard operation
stable-diffusion-webui-forge copied to clipboard

Inconsistent performance between flux and SDXL with precision half and all-in-fp32 on AMD RDNA2

Open derfasthirnlosenick opened this issue 1 year ago • 0 comments

I can get massive speedups on my AMD 6800xt for flux by enabling --all-in-fp32 rather than --precision half (on the nf4 model, still need to try with others). For SDXL, this has the opposite effect in pretty much the same magnitude (almost half as fast).

Flux with --precision half: 12-14s/it Flux with --all-in-fp32: 7.something s/it SDXL with --precision half 1.55 it/s or 0.64s/it SDXL with --all-in-fp32 1.1s/it

(All for a 1152x896 image with >20 steps).

I still want test.with other flux models than the nf4, but it's very unexpected (the nf4 performance is in line.with what I got from fp8 a while back)

Am I taking crazy pills? And if that has to be for some reason, is there a way to switch the precision without restarting forge?

derfasthirnlosenick avatar Sep 04 '24 06:09 derfasthirnlosenick