transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Community contribution: Adding GGUF support for more architectures

Open SunMarc opened this issue 1 year ago • 44 comments

Feature request

Recently, we have added the ability to load gguf files within transformers.

The goal was to offer the possibility to users to further train/fine-tune their gguf models.

See Workflow 1) Load gguf file in transformers: we dequantize the weights to fp32, then we load the weights to be used with PyTorch.
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
  1. train/finetune

  2. Convert the model back to gguf to use in the ggml ecosystem using convert_hf_to_gguf script or using gguf-my-repo space if you pushed your model on the hub :

tokenizer.save_pretrained('directory')
model.save_pretrained('directory')

!python ${path_to_llama_cpp}/convert-hf-to-gguf.py ${directory}

Let's try to add GGUF support for more architectures! Currently supported architectures are

  • [x] Llama
  • [x] Mistral
  • [x] Qwen2

It would be great to add the support for more architectures such as

  • [x] Phi3 https://github.com/huggingface/transformers/pull/31844
  • [x] Qwen2Moe https://github.com/huggingface/transformers/pull/33264
  • [x] Gemma2
  • [x] T5 https://github.com/huggingface/transformers/pull/33389
  • [x] Falcon https://github.com/huggingface/transformers/pull/33437
  • [x] Bloom https://github.com/huggingface/transformers/pull/33473
  • [x] StableLM https://github.com/huggingface/transformers/pull/33793
  • [x] gpt2 https://github.com/huggingface/transformers/pull/34044
  • [x] starcoder2 https://github.com/huggingface/transformers/pull/34094
  • [ ] llama4
  • [ ] Deepseekv3
  • [ ] c4ai-command-a

... and many more (Feel free to suggest more architectures ! The model needs to integrated in transformers)

Adding this feature would require to follow the same protocol as in this PR :

  1. Update GGUF_TENSOR_MAPPING and GGUF_CONFIG_MAPPING in order to map the tensor/config of the gguf file to the one on transformers.
  2. Create a GGUFXXXConverter(XXXConverter) class to convert the gguf tokenizer to a transformers one.
  3. Write tests

If you are interested to take up the challenge, comment below with the architecture name you want to integrate and open a PR!

Once you open a PR, feel free to ping @SunMarc @LysandreJik @ArthurZucker for a review !

Motivation

Support for more gguf models

Your contribution

Reviewing PRs and possibly adding the support for more models

SunMarc avatar Sep 02 '24 13:09 SunMarc

@SunMarc I am going to take Qwen2Moe

VladOS95-cyber avatar Sep 02 '24 14:09 VladOS95-cyber

@SunMarc I want to take Gemma2

KingNish24 avatar Sep 02 '24 16:09 KingNish24

@SunMarc May I suggest & take T5? Seems GGUF version of T5 encoder is highly used for getting along with FLUX.

junejae avatar Sep 03 '24 05:09 junejae

@SunMarc Hello! Unless someone else is working on this model already, may I take MiniCPM-V?

010kim avatar Sep 03 '24 06:09 010kim

@SunMarc May I suggest & take T5? Seems GGUF version of T5 encoder is highly used for getting along with FLUX.

Added @junejae !

@SunMarc Hello! Unless someone else is working on this model already, may I take MiniCPM-V?

Hi @010kim, thanks for the interest ! MiniCPM-V model relies on trust_remote_code=True, so I don't think we can add this model for now with gguf support. We don't want to have code in transformers that relies on modeling files that are on the hub. I will think about extending trust_remote_code=True to gguf support, so that the author of the model can add it himself !

SunMarc avatar Sep 03 '24 11:09 SunMarc

Hi @010kim, thanks for the interest ! MiniCPM-V model relies on trust_remote_code=True, so I don't think we can add this model for now with gguf support. We don't want to have code in transformers that relies on modeling files that are on the hub. I will think about extending trust_remote_code=True to gguf support, so that the author of the model can add it himself !

@SunMarc Thank you so much for your response. It also makes sense the author should work on it. What about Cohere? Can I take it?

010kim avatar Sep 03 '24 12:09 010kim

Hi @SunMarc 👋🏻 May I work with CLIP model if nobody is working on it?

jungnerd avatar Sep 05 '24 05:09 jungnerd

Hey @jungnerd ! The model you choose needs to be in the conversion script from hf to gguf. See the script here

SunMarc avatar Sep 05 '24 14:09 SunMarc

Hey @SunMarc 🙋‍♂️ I'd like to try my chance to contribute to this issue, can I take Falcon? 🦅

g-prz avatar Sep 09 '24 07:09 g-prz

Hi @SunMarc, I take bloom if nobody is working on it

VladOS95-cyber avatar Sep 11 '24 16:09 VladOS95-cyber

Hi @SunMarc, I'd like to handle the work related to Codestrall :)

fabxoe avatar Sep 12 '24 13:09 fabxoe

Hey @jungnerd ! The model you choose needs to be in the conversion script from hf to gguf. See the script here

There is conversion script for clip model(clip.cpp). Can I use this to contribute?

jungnerd avatar Sep 13 '24 03:09 jungnerd

Hi @SunMarc, I'm interested in this issue. Would it be okay if I worked on the BLIP model?

cjfghk5697 avatar Sep 14 '24 03:09 cjfghk5697

Hi @SunMarc, I'm interested in this issue. Would it be okay if I worked on the BLIP model?

Hi @SunMarc, I'd like to work on the BLIP model, but after researching, I found that it might be challenging due to the Vision model structure. Would it be alright if I switched to working on the Smol model instead?

cjfghk5697 avatar Sep 14 '24 17:09 cjfghk5697

Hey @SunMarc 🤗 Gonna continue with granite 🪨

g-prz avatar Sep 19 '24 09:09 g-prz

@SunMarc I checked the Smol model and confirmed that it's already functioning well without needing any further work. In the issue mentioned that supporting the Smol model would be beneficial, but is there any specific work required?

If not, I’ll proceed with switching to the dbrx model.

cjfghk5697 avatar Sep 21 '24 08:09 cjfghk5697

@SunMarc I checked the Smol model and confirmed that it's already functioning well without needing any further work. In the issue mentioned that supporting the Smol model would be beneficial, but is there any specific work required?

If not, I’ll proceed with switching to the dbrx model.

Oh indeed, this is because it is a llama architecture.

SunMarc avatar Sep 23 '24 15:09 SunMarc

Hi @SunMarc! I am going to start working on StableLM model

VladOS95-cyber avatar Sep 29 '24 07:09 VladOS95-cyber

Is any work being done on the Gemma2? If not, I would like to proceed with it! @SunMarc @KingNish24

yijun-lee avatar Oct 02 '24 16:10 yijun-lee

Hi @SunMarc! I suppose GPT2 gguf is not supported yet, if this is a case, I'll take it

VladOS95-cyber avatar Oct 03 '24 13:10 VladOS95-cyber

Hi @SunMarc, I'd like to handle the work related to Codestrall :)

Codestrall's tokenizer was just llama tokenizer. It looks like I don't need to handle codes of Codestrall

fabxoe avatar Oct 05 '24 08:10 fabxoe

@SunMarc Thank you so much for your response. It also makes sense the author should work on it. What about Cohere? Can I take it?

I went through the codes, and i was able to to load Cohere gguf model, but could not load the tokenizer. This is because Cohere slow tokenizer is not implemented in HuggingFace. (Only FastTokenizer is available for Cohere) Is there a way around to fix this? @SunMarc

010kim avatar Oct 06 '24 12:10 010kim

Hey @SunMarc! I"ll take Starcoder2 as next model

VladOS95-cyber avatar Oct 11 '24 13:10 VladOS95-cyber

Hi @SunMarc! I am going to start working on Mamba

VladOS95-cyber avatar Oct 14 '24 14:10 VladOS95-cyber

Are you still working on Gemma2? @yijun-lee @KingNish24 ? If not, is it possible for me to try working on it? Thank you!

farrosalferro avatar Nov 08 '24 07:11 farrosalferro

Are you still working on Gemma2? @yijun-lee @KingNish24 ? If not, is it possible for me to try working on it? Thank you!

I’m running behind schedule, but I’m making progress! I’ll handle it.

yijun-lee avatar Nov 08 '24 07:11 yijun-lee

I’m running behind schedule, but I’m making progress! I’ll handle it.

Glad to know! Then is it possible for me to try working on Nemotron? @SunMarc

farrosalferro avatar Nov 08 '24 07:11 farrosalferro

Could you please kindly check my PR @SunMarc? Thank you Add Nemotron GGUF Loading Support

farrosalferro avatar Nov 14 '24 07:11 farrosalferro

anyone working on supporting DeepSeek V3?

zinccat avatar Jan 30 '25 22:01 zinccat

Deekpseek V3 is not supported yet in transformers but it will be soon with this PR

SunMarc avatar Feb 05 '25 12:02 SunMarc