transformers Community contribution: Adding GGUF support for more architectures

Community contribution: Adding GGUF support for more architectures

Open SunMarc opened this issue 1 year ago • 44 comments

Feature request

Recently, we have added the ability to load gguf files within transformers.

The goal was to offer the possibility to users to further train/fine-tune their gguf models.

See Workflow

1) Load gguf file in transformers: we dequantize the weights to fp32, then we load the weights to be used with PyTorch.

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

train/finetune
Convert the model back to gguf to use in the ggml ecosystem using convert_hf_to_gguf script or using gguf-my-repo space if you pushed your model on the hub :

tokenizer.save_pretrained('directory')
model.save_pretrained('directory')

!python ${path_to_llama_cpp}/convert-hf-to-gguf.py ${directory}

Let's try to add GGUF support for more architectures! Currently supported architectures are

[x] Llama
[x] Mistral
[x] Qwen2

It would be great to add the support for more architectures such as

[x] Phi3 https://github.com/huggingface/transformers/pull/31844
[x] Qwen2Moe https://github.com/huggingface/transformers/pull/33264
[x] Gemma2
[x] T5 https://github.com/huggingface/transformers/pull/33389
[x] Falcon https://github.com/huggingface/transformers/pull/33437
[x] Bloom https://github.com/huggingface/transformers/pull/33473
[x] StableLM https://github.com/huggingface/transformers/pull/33793
[x] gpt2 https://github.com/huggingface/transformers/pull/34044
[x] starcoder2 https://github.com/huggingface/transformers/pull/34094
[ ] llama4
[ ] Deepseekv3
[ ] c4ai-command-a

... and many more (Feel free to suggest more architectures ! The model needs to integrated in transformers)

Adding this feature would require to follow the same protocol as in this PR :

Update GGUF_TENSOR_MAPPING and GGUF_CONFIG_MAPPING in order to map the tensor/config of the gguf file to the one on transformers.
Create a GGUFXXXConverter(XXXConverter) class to convert the gguf tokenizer to a transformers one.
Write tests

If you are interested to take up the challenge, comment below with the architecture name you want to integrate and open a PR!

Once you open a PR, feel free to ping @SunMarc @LysandreJik @ArthurZucker for a review !

Motivation

Support for more gguf models

Your contribution

Reviewing PRs and possibly adding the support for more models

Sep 02 '24 13:09 SunMarc

transformers transformers copied to clipboard

Community contribution: Adding GGUF support for more architectures

Feature request

Motivation

Your contribution

transformers
transformers copied to clipboard