stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Feature Request]: add a "--int8" function
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
I see that we have a command line flag "--no-half" which automatically turns fp16 into fp32. I think we may also add an option of "--int8" to convert the model into an int8 one, which can help us run some big models on devices with low memory. For example, when I try to run DreamshaperXL on my 8GB RAM + 10GB SWAP computer, it always triggers an OOM. With int8, maybe we can make it. Also, we can add an option to save the converted model on disk so that we don't have to convert every time.
Proposed workflow
-
start the webui like this:
webui.sh --int8 ..... -
Input prompt and start to generate.
-
WebUI quantize the model into int8 and run fast!
Additional information
No response
I can consider to add bitsandbytes int8/nf4 layer for transformer block in sd And let it works as how FP8 works.
Don't know how you think about my approach
I can consider to add bitsandbytes int8/nf4 layer for transformer block in sd And let it works as how FP8 works.
Don't know how you think about my approach
BTW don't know if you know we already have 8bit model weight support.
Are there 8 bit safetensors of SDXL or SD models otherwise?
Are there 8 bit safetensors of SDXL or SD models otherwise?
safetensors haven't merged the fp8 support PR
Are there 8 bit safetensors of SDXL or SD models otherwise?
It seems that there are few ones, so I hope webui can convert a fp16/fp32 model to int8 and run it on CPU.
I can consider to add bitsandbytes int8/nf4 layer for transformer block in sd And let it works as how FP8 works.
Don't know how you think about my approach
Excuse me that I don't know much about deep learning, so I may misunderstood you, but will FP8 work on CPU? Anyway, if this can reduce the size of model and make it ru faster, then it's good.
@KohakuBlueleaf the fp8 support has merged in safetensors. Using DrawThings on my MacBook, 8-bit models are useful indeed and the program allows to import any safetensor model and convert them to 8-bit versions to save on memory. Having this in automatic1111 would be nice indeed.
@KohakuBlueleaf the fp8 support has merged in safetensors. Using DrawThings on my MacBook, 8-bit models are useful indeed and the program allows to import any safetensor model and convert them to 8-bit versions to save on memory. Having this in automatic1111 would be nice indeed.
You mean you want a convert utility for making fp8 safetensors file?
Or loading fp8 safetensors?
Both :) As most models are full precision when you download them, having the ability to convert to 8-bit, and then use those would be a nice addition.
This is what it looks like in the model picker in DrawThings as an example. I would be ok with a separate script to convert as well - if automatic1111 would load the 8-bit model just fine.
Both :) As most models are full precision when you download them, having the ability to convert to 8-bit, and then use those would be a nice addition.
This is what it looks like in the model picker in DrawThings as an example. I would be ok with a separate script to convert as well - if automatic1111 would load the 8-bit model just fine.
A41 doesn't have 8bit toolkit but you can load fp8 safetensors directly
Kohaku xl gamma also have fp8 model file
Found this while searching. How did this thread morph from talking about int8 to talking about fp8?
Also, would it be possible leverage the gguf format for int8 quantization?
Found this while searching. How did this thread morph from talking about int8 to talking about fp8?
Also, would it be possible leverage the gguf format for int8 quantization?
Yes. You can use gguf to store quabtized weight in disk but that's not what we want in this thread.
Found this while searching. How did this thread morph from talking about int8 to talking about fp8? Also, would it be possible leverage the gguf format for int8 quantization?
Yes. You can use gguf to store quabtized weight in disk but that's not what we want in this thread.
Any plan for supporting int8 model?
Has someone found the answer?
