stable-diffusion-webui [Feature Request]: add a "--int8" function

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

I see that we have a command line flag "--no-half" which automatically turns fp16 into fp32. I think we may also add an option of "--int8" to convert the model into an int8 one, which can help us run some big models on devices with low memory. For example, when I try to run DreamshaperXL on my 8GB RAM + 10GB SWAP computer, it always triggers an OOM. With int8, maybe we can make it. Also, we can add an option to save the converted model on disk so that we don't have to convert every time.

Proposed workflow

start the webui like this: webui.sh --int8 .....
Input prompt and start to generate.
WebUI quantize the model into int8 and run fast!

Additional information

No response

Jan 03 '24 15:01 Micraow

I can consider to add bitsandbytes int8/nf4 layer for transformer block in sd And let it works as how FP8 works.

Don't know how you think about my approach

Jan 08 '24 08:01 KohakuBlueleaf

I can consider to add bitsandbytes int8/nf4 layer for transformer block in sd And let it works as how FP8 works.

Don't know how you think about my approach

BTW don't know if you know we already have 8bit model weight support.

Jan 08 '24 08:01 KohakuBlueleaf

Are there 8 bit safetensors of SDXL or SD models otherwise?

Jan 09 '24 00:01 Manchovies

Are there 8 bit safetensors of SDXL or SD models otherwise?

safetensors haven't merged the fp8 support PR

Jan 09 '24 04:01 KohakuBlueleaf

Are there 8 bit safetensors of SDXL or SD models otherwise?

It seems that there are few ones, so I hope webui can convert a fp16/fp32 model to int8 and run it on CPU.

I can consider to add bitsandbytes int8/nf4 layer for transformer block in sd And let it works as how FP8 works.

Don't know how you think about my approach

Excuse me that I don't know much about deep learning, so I may misunderstood you, but will FP8 work on CPU? Anyway, if this can reduce the size of model and make it ru faster, then it's good.

Jan 12 '24 05:01 Micraow

@KohakuBlueleaf the fp8 support has merged in safetensors. Using DrawThings on my MacBook, 8-bit models are useful indeed and the program allows to import any safetensor model and convert them to 8-bit versions to save on memory. Having this in automatic1111 would be nice indeed.

Mar 05 '24 09:03 wobba

@KohakuBlueleaf the fp8 support has merged in safetensors. Using DrawThings on my MacBook, 8-bit models are useful indeed and the program allows to import any safetensor model and convert them to 8-bit versions to save on memory. Having this in automatic1111 would be nice indeed.

You mean you want a convert utility for making fp8 safetensors file?

Or loading fp8 safetensors?

Mar 05 '24 09:03 KohakuBlueleaf

Both :) As most models are full precision when you download them, having the ability to convert to 8-bit, and then use those would be a nice addition.

This is what it looks like in the model picker in DrawThings as an example. I would be ok with a separate script to convert as well - if automatic1111 would load the 8-bit model just fine.

Mar 05 '24 10:03 wobba

Both :) As most models are full precision when you download them, having the ability to convert to 8-bit, and then use those would be a nice addition.

This is what it looks like in the model picker in DrawThings as an example. I would be ok with a separate script to convert as well - if automatic1111 would load the 8-bit model just fine.

A41 doesn't have 8bit toolkit but you can load fp8 safetensors directly

Kohaku xl gamma also have fp8 model file

Mar 05 '24 11:03 KohakuBlueleaf

Found this while searching. How did this thread morph from talking about int8 to talking about fp8?

Also, would it be possible leverage the gguf format for int8 quantization?

Apr 09 '24 14:04 dhelgerson

Found this while searching. How did this thread morph from talking about int8 to talking about fp8?

Also, would it be possible leverage the gguf format for int8 quantization?

Yes. You can use gguf to store quabtized weight in disk but that's not what we want in this thread.

Apr 09 '24 14:04 KohakuBlueleaf

Found this while searching. How did this thread morph from talking about int8 to talking about fp8? Also, would it be possible leverage the gguf format for int8 quantization?

Yes. You can use gguf to store quabtized weight in disk but that's not what we want in this thread.

Any plan for supporting int8 model?

May 14 '24 08:05 bigmover

Has someone found the answer?

Feb 18 '25 02:02 Lionnn107

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Feature Request]: add a "--int8" function

Is there an existing issue for this?

What would your feature do ?

Proposed workflow

Additional information

stable-diffusion-webui
stable-diffusion-webui copied to clipboard