ComfyUI-WanVideoWrapper icon indicating copy to clipboard operation
ComfyUI-WanVideoWrapper copied to clipboard

GGUF support for LORA merging - An SVI case

Open supereth opened this issue 4 months ago • 4 comments

For us pleebs that do not have very high end GPUs (5080 here) its quite complicated to run even the FP8 models of WAN 2.2, the GGUFs are our salvation but unfortunately LORA merging is not compatible as of now, limiting our output quality a lot specially using SVI related workflows.

Since recent updates to the llama.cpp framework have introduced functionality that allows for the direct merging of LoRA adapters into GGUF models without requiring the use of external frameworks like Hugging Face PEFT (some script such as llama-export-lora to merge the LoRA adapter with the base model if what I've read is correct), is it realistic that this will be implemented in the WanVideoWrapper environment?

And @kijai just as curiosity, what are your gear specs you usually run with?

Thanks for all your contributions!

supereth avatar Oct 30 '25 15:10 supereth

Do you have links to those resources?

I'm not really able to spend much time on such, but if it's simple without lots of dependencies then sure.

But I'd also like to note that when using offloading such as block_swap, the real benefit from lower GGUF quants is mostly just the reduced RAM use. With full block_swap for example only single block is on the GPU at a time, and difference in size for single block between quants is really small.

I know the speed of the offloading depends on the system, with single block prefetch I'm personally not getting any speed loss from using it on fp8 models.

Also nothing about SVI increases memory use as it's extension method?

kijai avatar Oct 30 '25 15:10 kijai

Apparently a tool called llama-export-lora

This is by far not my niche of expertise so apologies in advance if it leads to nothing, I'll leave a couple links here though:

https://github.com/ggml-org/llama.cpp/discussions/8594#discussioncomment-10104888 https://github.com/ggml-org/llama.cpp/pull/8332

Although scanning through the threads it feels now that is more of a fine tuning of GGUF models to include LORAs instead of what it would be much more interesting which is the LORA merging capability of the WanVideoLoraSelect nodes for the GGUFs.

supereth avatar Oct 30 '25 16:10 supereth

As I have looked a bit more into the matter:

"The llama-export-lora tool has been removed from the llama.cpp repository and is no longer available. It was previously used for converting LoRA adapters but was deprecated due to bugs and poor performance.

As a result, there is no current functionality within llama.cpp to directly convert a LoRA safetensors file to a GGUF LoRA adapter using llama-export-lora.

Instead, the recommended approach is to use the convert_lora_to_gguf.py script, which is available in the llama.cpp repository.

This script requires both the base model and the LoRA adapter directory as inputs. For example, to convert a LoRA adapter from Hugging Face, you would run:

python3 convert_lora_to_gguf.py --base models/Meta-Llama-3-8B-Instruct --outfile models/llama-3-abliterated-f16-lora.gguf --outtype f16 models/Llama-3-Instruct-abliteration-LoRA-8B

This command converts the LoRA adapter into a GGUF format file that can be used with llama.cpp.

The resulting GGUF LoRA adapter can then be applied to a base model during inference using the --lora flag.

If you are working with a model trained using MLX, you may need to first convert the adapter to a PEFT-compatible format before using convert_lora_to_gguf.py. The script performs operations such as renaming keys and transposing weight matrices to ensure compatibility with llama.cpp.

After conversion, the resulting adapter can be used with the base model in GGUF format."

supereth avatar Nov 02 '25 09:11 supereth

Anyone can verify this and if there are lora in gguf available and if this now allow model gguf merging with lora gguf

GeorgeS2019 avatar Nov 11 '25 12:11 GeorgeS2019