AutoGPTQ icon indicating copy to clipboard operation
AutoGPTQ copied to clipboard

[Quick pool] Give your opinion on which model should also be supported

Open PanQiWei opened this issue 2 years ago • 3 comments

For everyone that interested in auto-gptq: I will focus on expanding auto-gptq to support more models and more network architectures during the upgrade from version 0.2.0 to 0.3.0.

Feel free to make request by raising a issue with an enhancement label for the model that you want to be supported in auto-gptq, and describe it's advantage 🥳 . Or you can also simply comment below!

PanQiWei avatar May 26 '23 18:05 PanQiWei

There is two models I'd like to see added which are gpt4all-j-v1.3-groovy and wizard-13b-uncensored

Also willing to contribute or help if needed.

MetaWabbit avatar May 29 '23 10:05 MetaWabbit

  1. MPT models by MosaicML: https://huggingface.co/mosaicml
  2. Pythia-based models by Open Assistant: https://huggingface.co/OpenAssistant
  3. Instructor models by HK University: https://huggingface.co/hkunlp
  4. Falcon 40B (base and instruct) by Technology Innovation Institute: https://huggingface.co/tiiuae

abhinavkulkarni avatar May 29 '23 16:05 abhinavkulkarni

There is two models I'd like to see added which are gpt4all-j-v1.3-groovy and wizard-13b-uncensored

Also willing to contribute or help if needed.

Hi, seems these two models' model_type are gptj and llama, respectively, which are already supported by auto_gptq

PanQiWei avatar May 30 '23 00:05 PanQiWei

Hello, MPT would be great!

MPT is using AliBi, while Falcon is offering multi query in the 40B model. Both are having strength.

debackerl avatar Jun 12 '23 13:06 debackerl

MPT support would be great, yeah. There was a PR for that but apparently there's a problem in the base MPT repo which meant it couldn't work. I don't know if that's still the case or not.

We already have FalconLM support. Unfortunately it's really slow at the moment - like less than 1 token/s on FalconLM 40B, even on an H100 80GB. We're not sure why. But it does work.

Here's a few Falcon GPTQ 4bit models you can try:

  • https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ
  • https://huggingface.co/TheBloke/falcon-7b-instruct-GPTQ
  • https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GPTQ
  • https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-7B-GPTQ

I also did a couple in 3bit, so they can load on a 24GB GPU:

  • https://huggingface.co/TheBloke/falcon-40b-instruct-3bit-GPTQ
  • https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-3bit-GPTQ

TheBloke avatar Jun 12 '23 14:06 TheBloke