InvokeAI [bug]: 5.0 release ignores quantization

[bug]: 5.0 release ignores quantization

Open zethfoxster opened this issue 4 months ago • 2 comments

Windows

Nvidia (CUDA)

rtx 4090

24g

chrome

No response

loading fp8 models uses the same amount of vram as loading the full unquantized versions of flux. capping my 24gigs

it should run at about 20 gigs or less depending on which of the Q models I choose.

No response

No response

No response

Sep 25 '24 01:09 zethfoxster