transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Community contribution: Adding GGUF support for more architectures

Open SunMarc opened this issue 1 year ago • 44 comments

Feature request

Recently, we have added the ability to load gguf files within transformers.

The goal was to offer the possibility to users to further train/fine-tune their gguf models.

See Workflow 1) Load gguf file in transformers: we dequantize the weights to fp32, then we load the weights to be used with PyTorch.
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
  1. train/finetune

  2. Convert the model back to gguf to use in the ggml ecosystem using convert_hf_to_gguf script or using gguf-my-repo space if you pushed your model on the hub :

tokenizer.save_pretrained('directory')
model.save_pretrained('directory')

!python ${path_to_llama_cpp}/convert-hf-to-gguf.py ${directory}

Let's try to add GGUF support for more architectures! Currently supported architectures are

  • [x] Llama
  • [x] Mistral
  • [x] Qwen2

It would be great to add the support for more architectures such as

  • [x] Phi3 https://github.com/huggingface/transformers/pull/31844
  • [x] Qwen2Moe https://github.com/huggingface/transformers/pull/33264
  • [x] Gemma2
  • [x] T5 https://github.com/huggingface/transformers/pull/33389
  • [x] Falcon https://github.com/huggingface/transformers/pull/33437
  • [x] Bloom https://github.com/huggingface/transformers/pull/33473
  • [x] StableLM https://github.com/huggingface/transformers/pull/33793
  • [x] gpt2 https://github.com/huggingface/transformers/pull/34044
  • [x] starcoder2 https://github.com/huggingface/transformers/pull/34094
  • [ ] llama4
  • [ ] Deepseekv3
  • [ ] c4ai-command-a

... and many more (Feel free to suggest more architectures ! The model needs to integrated in transformers)

Adding this feature would require to follow the same protocol as in this PR :

  1. Update GGUF_TENSOR_MAPPING and GGUF_CONFIG_MAPPING in order to map the tensor/config of the gguf file to the one on transformers.
  2. Create a GGUFXXXConverter(XXXConverter) class to convert the gguf tokenizer to a transformers one.
  3. Write tests

If you are interested to take up the challenge, comment below with the architecture name you want to integrate and open a PR!

Once you open a PR, feel free to ping @SunMarc @LysandreJik @ArthurZucker for a review !

Motivation

Support for more gguf models

Your contribution

Reviewing PRs and possibly adding the support for more models

SunMarc avatar Sep 02 '24 13:09 SunMarc