stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Feature Request]: add a "--int8" function

Open Micraow opened this issue 1 year ago • 13 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

I see that we have a command line flag "--no-half" which automatically turns fp16 into fp32. I think we may also add an option of "--int8" to convert the model into an int8 one, which can help us run some big models on devices with low memory. For example, when I try to run DreamshaperXL on my 8GB RAM + 10GB SWAP computer, it always triggers an OOM. With int8, maybe we can make it. Also, we can add an option to save the converted model on disk so that we don't have to convert every time.

Proposed workflow

  1. start the webui like this: webui.sh --int8 .....

  2. Input prompt and start to generate.

  3. WebUI quantize the model into int8 and run fast!

Additional information

No response

Micraow avatar Jan 03 '24 15:01 Micraow

I can consider to add bitsandbytes int8/nf4 layer for transformer block in sd And let it works as how FP8 works.

Don't know how you think about my approach

KohakuBlueleaf avatar Jan 08 '24 08:01 KohakuBlueleaf

I can consider to add bitsandbytes int8/nf4 layer for transformer block in sd And let it works as how FP8 works.

Don't know how you think about my approach

BTW don't know if you know we already have 8bit model weight support.

KohakuBlueleaf avatar Jan 08 '24 08:01 KohakuBlueleaf

Are there 8 bit safetensors of SDXL or SD models otherwise?

Manchovies avatar Jan 09 '24 00:01 Manchovies

Are there 8 bit safetensors of SDXL or SD models otherwise?

safetensors haven't merged the fp8 support PR

KohakuBlueleaf avatar Jan 09 '24 04:01 KohakuBlueleaf

Are there 8 bit safetensors of SDXL or SD models otherwise?

It seems that there are few ones, so I hope webui can convert a fp16/fp32 model to int8 and run it on CPU.

I can consider to add bitsandbytes int8/nf4 layer for transformer block in sd And let it works as how FP8 works.

Don't know how you think about my approach

Excuse me that I don't know much about deep learning, so I may misunderstood you, but will FP8 work on CPU? Anyway, if this can reduce the size of model and make it ru faster, then it's good.

Micraow avatar Jan 12 '24 05:01 Micraow

@KohakuBlueleaf the fp8 support has merged in safetensors. Using DrawThings on my MacBook, 8-bit models are useful indeed and the program allows to import any safetensor model and convert them to 8-bit versions to save on memory. Having this in automatic1111 would be nice indeed.

wobba avatar Mar 05 '24 09:03 wobba

@KohakuBlueleaf the fp8 support has merged in safetensors. Using DrawThings on my MacBook, 8-bit models are useful indeed and the program allows to import any safetensor model and convert them to 8-bit versions to save on memory. Having this in automatic1111 would be nice indeed.

You mean you want a convert utility for making fp8 safetensors file?

Or loading fp8 safetensors?

KohakuBlueleaf avatar Mar 05 '24 09:03 KohakuBlueleaf

Both :) As most models are full precision when you download them, having the ability to convert to 8-bit, and then use those would be a nice addition.

This is what it looks like in the model picker in DrawThings as an example. I would be ok with a separate script to convert as well - if automatic1111 would load the 8-bit model just fine.

image

wobba avatar Mar 05 '24 10:03 wobba

Both :) As most models are full precision when you download them, having the ability to convert to 8-bit, and then use those would be a nice addition.

This is what it looks like in the model picker in DrawThings as an example. I would be ok with a separate script to convert as well - if automatic1111 would load the 8-bit model just fine.

image

A41 doesn't have 8bit toolkit but you can load fp8 safetensors directly

Kohaku xl gamma also have fp8 model file

KohakuBlueleaf avatar Mar 05 '24 11:03 KohakuBlueleaf

Found this while searching. How did this thread morph from talking about int8 to talking about fp8?

Also, would it be possible leverage the gguf format for int8 quantization?

dhelgerson avatar Apr 09 '24 14:04 dhelgerson

Found this while searching. How did this thread morph from talking about int8 to talking about fp8?

Also, would it be possible leverage the gguf format for int8 quantization?

Yes. You can use gguf to store quabtized weight in disk but that's not what we want in this thread.

KohakuBlueleaf avatar Apr 09 '24 14:04 KohakuBlueleaf

Found this while searching. How did this thread morph from talking about int8 to talking about fp8? Also, would it be possible leverage the gguf format for int8 quantization?

Yes. You can use gguf to store quabtized weight in disk but that's not what we want in this thread.

Any plan for supporting int8 model?

bigmover avatar May 14 '24 08:05 bigmover

Has someone found the answer?

Lionnn107 avatar Feb 18 '25 02:02 Lionnn107