InvokeAI
InvokeAI copied to clipboard
Initial GGUF support for flux models
Summary
Support for GGUF quantized models within the FLUX ecosystem
QA Instructions
Attempt to install and run with multiple GGUF quantized flux models
Merge Plan
After thorough reviews, can be merged when approved
Checklist
- [ ] The PR has a short but descriptive title, suitable for a changelog
- [ ] Tests added / updated (if applicable)
- [ ] Documentation added / updated (if applicable)
I've taken the liberty of adding .gguf to the list of model suffixes that get searched for when scanning a folder for model import.
After further testing, I did find a GGUF quantized model on Civitai that does not load properly:
The URL is https://civitai.com/models/705823/ggufk-flux-unchained-km-quants (warning: NSFW). This is a Q4_KM model. I guess KM quantization is not yet supported?
It loads and installs as expected, but when generating gives this error:
[2024-09-21 21:04:25,679]::[InvokeAI]::ERROR --> Error while invoking session 82cd8bc8-9036-41f8-b524-4a2796f279c7, invocation 7e498e87-44e4-4d63-91a7-f9e4e65e6ed2 (flux_denoise): Error(s) in loading state_dict for Flux:
size mismatch for img_in.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([3072, 64]).
[2024-09-21 21:04:25,679]::[InvokeAI]::ERROR --> Traceback (most recent call last):
File "/home/lstein/Projects/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
output = invocation.invoke_internal(context=context, services=self._services)
File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 288, in invoke_internal
output = self.invoke(context)
File "/home/lstein/invokeai-main/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/flux_denoise.py", line 88, in invoke
latents = self._run_diffusion(context)
File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/flux_denoise.py", line 124, in _run_diffusion
transformer_info = context.models.load(self.transformer.transformer)
File "/home/lstein/Projects/InvokeAI/invokeai/app/services/shared/invocation_context.py", line 369, in load
return self._services.model_manager.load.load_model(model, _submodel_type)
File "/home/lstein/Projects/InvokeAI/invokeai/app/services/model_load/model_load_default.py", line 70, in load_model
).load_model(model_config, submodel_type)
File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/load_default.py", line 56, in load_model
locker = self._load_and_cache(model_config, submodel_type)
File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/load_default.py", line 77, in _load_and_cache
loaded_model = self._load_model(config, submodel_type)
File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/model_loaders/flux.py", line 224, in _load_model
return self._load_from_singlefile(config)
File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/model_loaders/flux.py", line 248, in _load_from_singlefile
model.load_state_dict(sd, assign=True)
File "/home/lstein/invokeai-main/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Flux:
size mismatch for img_in.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([3072, 64]).
I also tried installing a quantized GGUF-format T5 encoder, and it failed as expected.
After further testing, I did find a GGUF quantized model on Civitai that does not load properly:
The URL is https://civitai.com/models/705823/ggufk-flux-unchained-km-quants (warning: NSFW). This is a Q4_KM model. I guess KM quantization is not yet supported?
It loads and installs as expected, but when generating gives this error:
[2024-09-21 21:04:25,679]::[InvokeAI]::ERROR --> Error while invoking session 82cd8bc8-9036-41f8-b524-4a2796f279c7, invocation 7e498e87-44e4-4d63-91a7-f9e4e65e6ed2 (flux_denoise): Error(s) in loading state_dict for Flux: size mismatch for img_in.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([3072, 64]). [2024-09-21 21:04:25,679]::[InvokeAI]::ERROR --> Traceback (most recent call last): File "/home/lstein/Projects/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node output = invocation.invoke_internal(context=context, services=self._services) File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 288, in invoke_internal output = self.invoke(context) File "/home/lstein/invokeai-main/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/flux_denoise.py", line 88, in invoke latents = self._run_diffusion(context) File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/flux_denoise.py", line 124, in _run_diffusion transformer_info = context.models.load(self.transformer.transformer) File "/home/lstein/Projects/InvokeAI/invokeai/app/services/shared/invocation_context.py", line 369, in load return self._services.model_manager.load.load_model(model, _submodel_type) File "/home/lstein/Projects/InvokeAI/invokeai/app/services/model_load/model_load_default.py", line 70, in load_model ).load_model(model_config, submodel_type) File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/load_default.py", line 56, in load_model locker = self._load_and_cache(model_config, submodel_type) File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/load_default.py", line 77, in _load_and_cache loaded_model = self._load_model(config, submodel_type) File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/model_loaders/flux.py", line 224, in _load_model return self._load_from_singlefile(config) File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/model_loaders/flux.py", line 248, in _load_from_singlefile model.load_state_dict(sd, assign=True) File "/home/lstein/invokeai-main/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Flux: size mismatch for img_in.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([3072, 64]).I also tried installing a quantized GGUF-format T5 encoder, and it failed as expected.
KM quantized models are definitely supported and working in this PR. That specific one has an incorrectly shaped weight in it that @RyanJDick added a patch to our code to make it work.
T5 Encoders and other models will be handleded in a future pr
Just a note that after merging locally with the main branch, attempts to render .gguf files results in this error:
Multiple disp
atch failed for 'torch._ops.aten.silu.default'; all __torch_dispatch__ handlers returned NotImplemented:
- tensor subclass <class 'invokeai.backend.quantization.gguf.ggml_tensor.GGMLTensor'>
Just a note that after merging locally with the main branch, attempts to render .gguf files results in this error:
Multiple disp atch failed for 'torch._ops.aten.silu.default'; all __torch_dispatch__ handlers returned NotImplemented: - tensor subclass <class 'invokeai.backend.quantization.gguf.ggml_tensor.GGMLTensor'>
In its current state, this PR requires you to run pip install to update your version of torch. Investigating removing that requirement though
Just ran the following manual tests:
- Speed / compatibility:
- Non-quantized: 2.08 it/s
- GGUF Q8_0: 1.61 it/s
- GGUF Q4_K_S: 1.24 it/s
- GGUF Q2_K: 1.27 it/s
- bnb NF4: 1.89 it/s
- FLUX LoRAs can be applied on top of GGUF models
- Smoke tested a bunch of stuff after torch bump (SD1, SDXL, LoRA, ControlNet, IP-Adapter)
Cool!
I'm getting the following error, tried v5.1.0rc4 and v5.1.0rc3 (via Stability Matrix):
File "C:\Users\virat\AppData\Roaming\StabilityMatrix\Packages\InvokeAI\invokeai\backend\quantization\gguf\loaders.py", line 3, in
import gguf ModuleNotFoundError: No module named 'gguf'
Any suggestion how can I resolve this on my end?
I'm getting the following error, tried v5.1.0rc4 and v5.1.0rc3 (via Stability Matrix):
File "C:\Users\virat\AppData\Roaming\StabilityMatrix\Packages\InvokeAI\invokeai\backend\quantization\gguf\loaders.py", line 3, in import gguf ModuleNotFoundError: No module named 'gguf'
Any suggestion how can I resolve this on my end?
Looks like StabilityMatrix isn’t installing with the gguf module. Would reach out to them