InvokeAI Initial GGUF support for flux models

Summary

Support for GGUF quantized models within the FLUX ecosystem

QA Instructions

Attempt to install and run with multiple GGUF quantized flux models

Merge Plan

After thorough reviews, can be merged when approved

Checklist

[ ] The PR has a short but descriptive title, suitable for a changelog
[ ] Tests added / updated (if applicable)
[ ] Documentation added / updated (if applicable)

Sep 19 '24 20:09 brandonrising

I've taken the liberty of adding .gguf to the list of model suffixes that get searched for when scanning a folder for model import.

Sep 22 '24 00:09 lstein

After further testing, I did find a GGUF quantized model on Civitai that does not load properly:

The URL is https://civitai.com/models/705823/ggufk-flux-unchained-km-quants (warning: NSFW). This is a Q4_KM model. I guess KM quantization is not yet supported?

It loads and installs as expected, but when generating gives this error:

[2024-09-21 21:04:25,679]::[InvokeAI]::ERROR --> Error while invoking session 82cd8bc8-9036-41f8-b524-4a2796f279c7, invocation 7e498e87-44e4-4d63-91a7-f9e4e65e6ed2 (flux_denoise): Error(s) in loading state_dict for Flux:
        size mismatch for img_in.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([3072, 64]).
[2024-09-21 21:04:25,679]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/lstein/Projects/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 288, in invoke_internal
    output = self.invoke(context)
  File "/home/lstein/invokeai-main/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/flux_denoise.py", line 88, in invoke
    latents = self._run_diffusion(context)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/flux_denoise.py", line 124, in _run_diffusion
    transformer_info = context.models.load(self.transformer.transformer)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/services/shared/invocation_context.py", line 369, in load
    return self._services.model_manager.load.load_model(model, _submodel_type)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/services/model_load/model_load_default.py", line 70, in load_model
    ).load_model(model_config, submodel_type)
  File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/load_default.py", line 56, in load_model
    locker = self._load_and_cache(model_config, submodel_type)
  File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/load_default.py", line 77, in _load_and_cache
    loaded_model = self._load_model(config, submodel_type)
  File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/model_loaders/flux.py", line 224, in _load_model
    return self._load_from_singlefile(config)
  File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/model_loaders/flux.py", line 248, in _load_from_singlefile
    model.load_state_dict(sd, assign=True)
  File "/home/lstein/invokeai-main/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Flux:
        size mismatch for img_in.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([3072, 64]).

I also tried installing a quantized GGUF-format T5 encoder, and it failed as expected.

Sep 22 '24 01:09 lstein

After further testing, I did find a GGUF quantized model on Civitai that does not load properly:

The URL is https://civitai.com/models/705823/ggufk-flux-unchained-km-quants (warning: NSFW). This is a Q4_KM model. I guess KM quantization is not yet supported?

It loads and installs as expected, but when generating gives this error:

[2024-09-21 21:04:25,679]::[InvokeAI]::ERROR --> Error while invoking session 82cd8bc8-9036-41f8-b524-4a2796f279c7, invocation 7e498e87-44e4-4d63-91a7-f9e4e65e6ed2 (flux_denoise): Error(s) in loading state_dict for Flux:
        size mismatch for img_in.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([3072, 64]).
[2024-09-21 21:04:25,679]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/lstein/Projects/InvokeAI/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/baseinvocation.py", line 288, in invoke_internal
    output = self.invoke(context)
  File "/home/lstein/invokeai-main/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/flux_denoise.py", line 88, in invoke
    latents = self._run_diffusion(context)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/invocations/flux_denoise.py", line 124, in _run_diffusion
    transformer_info = context.models.load(self.transformer.transformer)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/services/shared/invocation_context.py", line 369, in load
    return self._services.model_manager.load.load_model(model, _submodel_type)
  File "/home/lstein/Projects/InvokeAI/invokeai/app/services/model_load/model_load_default.py", line 70, in load_model
    ).load_model(model_config, submodel_type)
  File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/load_default.py", line 56, in load_model
    locker = self._load_and_cache(model_config, submodel_type)
  File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/load_default.py", line 77, in _load_and_cache
    loaded_model = self._load_model(config, submodel_type)
  File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/model_loaders/flux.py", line 224, in _load_model
    return self._load_from_singlefile(config)
  File "/home/lstein/Projects/InvokeAI/invokeai/backend/model_manager/load/model_loaders/flux.py", line 248, in _load_from_singlefile
    model.load_state_dict(sd, assign=True)
  File "/home/lstein/invokeai-main/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Flux:
        size mismatch for img_in.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([3072, 64]).

I also tried installing a quantized GGUF-format T5 encoder, and it failed as expected.

KM quantized models are definitely supported and working in this PR. That specific one has an incorrectly shaped weight in it that @RyanJDick added a patch to our code to make it work.

T5 Encoders and other models will be handleded in a future pr

Oct 02 '24 14:10 brandonrising

Just a note that after merging locally with the main branch, attempts to render .gguf files results in this error:

Multiple disp
atch failed for 'torch._ops.aten.silu.default'; all __torch_dispatch__ handlers returned NotImplemented:                                                                                         
                                                                                                                                                                                                 
  - tensor subclass <class 'invokeai.backend.quantization.gguf.ggml_tensor.GGMLTensor'>

Oct 02 '24 16:10 lstein

Just a note that after merging locally with the main branch, attempts to render .gguf files results in this error:

Multiple disp
atch failed for 'torch._ops.aten.silu.default'; all __torch_dispatch__ handlers returned NotImplemented:                                                                                         
                                                                                                                                                                                                 
  - tensor subclass <class 'invokeai.backend.quantization.gguf.ggml_tensor.GGMLTensor'>

In its current state, this PR requires you to run pip install to update your version of torch. Investigating removing that requirement though

Oct 02 '24 16:10 brandonrising

Just ran the following manual tests:

Speed / compatibility:
- Non-quantized: 2.08 it/s
- GGUF Q8_0: 1.61 it/s
- GGUF Q4_K_S: 1.24 it/s
- GGUF Q2_K: 1.27 it/s
- bnb NF4: 1.89 it/s
FLUX LoRAs can be applied on top of GGUF models
Smoke tested a bunch of stuff after torch bump (SD1, SDXL, LoRA, ControlNet, IP-Adapter)

Oct 02 '24 21:10 RyanJDick

Cool!

Oct 03 '24 00:10 lstein

I'm getting the following error, tried v5.1.0rc4 and v5.1.0rc3 (via Stability Matrix):

File "C:\Users\virat\AppData\Roaming\StabilityMatrix\Packages\InvokeAI\invokeai\backend\quantization\gguf\loaders.py", line 3, in import gguf ModuleNotFoundError: No module named 'gguf'

Any suggestion how can I resolve this on my end?

Oct 05 '24 13:10 ViratX

I'm getting the following error, tried v5.1.0rc4 and v5.1.0rc3 (via Stability Matrix):

File "C:\Users\virat\AppData\Roaming\StabilityMatrix\Packages\InvokeAI\invokeai\backend\quantization\gguf\loaders.py", line 3, in import gguf ModuleNotFoundError: No module named 'gguf'

Any suggestion how can I resolve this on my end?

Looks like StabilityMatrix isn’t installing with the gguf module. Would reach out to them

Oct 05 '24 16:10 hipsterusername

InvokeAI InvokeAI copied to clipboard

Initial GGUF support for flux models

Summary

QA Instructions

Merge Plan

Checklist

InvokeAI
InvokeAI copied to clipboard