transformers
transformers copied to clipboard
Community contribution: Adding GGUF support for more architectures
Feature request
Recently, we have added the ability to load gguf files within transformers.
The goal was to offer the possibility to users to further train/fine-tune their gguf models.
See Workflow
1) Load gguf file in transformers: we dequantize the weights to fp32, then we load the weights to be used with PyTorch.from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
-
train/finetune
-
Convert the model back to gguf to use in the ggml ecosystem using convert_hf_to_gguf script or using gguf-my-repo space if you pushed your model on the hub :
tokenizer.save_pretrained('directory')
model.save_pretrained('directory')
!python ${path_to_llama_cpp}/convert-hf-to-gguf.py ${directory}
Let's try to add GGUF support for more architectures! Currently supported architectures are
- [x] Llama
- [x] Mistral
- [x] Qwen2
It would be great to add the support for more architectures such as
- [x] Phi3 https://github.com/huggingface/transformers/pull/31844
- [x] Qwen2Moe https://github.com/huggingface/transformers/pull/33264
- [x] Gemma2
- [x] T5 https://github.com/huggingface/transformers/pull/33389
- [x] Falcon https://github.com/huggingface/transformers/pull/33437
- [x] Bloom https://github.com/huggingface/transformers/pull/33473
- [x] StableLM https://github.com/huggingface/transformers/pull/33793
- [x] gpt2 https://github.com/huggingface/transformers/pull/34044
- [x] starcoder2 https://github.com/huggingface/transformers/pull/34094
- [ ] llama4
- [ ] Deepseekv3
- [ ] c4ai-command-a
... and many more (Feel free to suggest more architectures ! The model needs to integrated in transformers)
Adding this feature would require to follow the same protocol as in this PR :
- Update
GGUF_TENSOR_MAPPINGandGGUF_CONFIG_MAPPINGin order to map the tensor/config of the gguf file to the one on transformers. - Create a
GGUFXXXConverter(XXXConverter)class to convert the gguf tokenizer to a transformers one. - Write tests
If you are interested to take up the challenge, comment below with the architecture name you want to integrate and open a PR!
Once you open a PR, feel free to ping @SunMarc @LysandreJik @ArthurZucker for a review !
Motivation
Support for more gguf models
Your contribution
Reviewing PRs and possibly adding the support for more models