AutoGPTQ
AutoGPTQ copied to clipboard
[Quick pool] Give your opinion on which model should also be supported
For everyone that interested in auto-gptq: I will focus on expanding auto-gptq to support more models and more network architectures during the upgrade from version 0.2.0 to 0.3.0.
Feel free to make request by raising a issue with an enhancement label for the model that you want to be supported in auto-gptq, and describe it's advantage 🥳 . Or you can also simply comment below!
There is two models I'd like to see added which are gpt4all-j-v1.3-groovy and wizard-13b-uncensored
Also willing to contribute or help if needed.
- MPT models by MosaicML: https://huggingface.co/mosaicml
- Pythia-based models by Open Assistant: https://huggingface.co/OpenAssistant
- Instructor models by HK University: https://huggingface.co/hkunlp
- Falcon 40B (base and instruct) by Technology Innovation Institute: https://huggingface.co/tiiuae
There is two models I'd like to see added which are gpt4all-j-v1.3-groovy and wizard-13b-uncensored
Also willing to contribute or help if needed.
Hi, seems these two models' model_type are gptj and llama, respectively, which are already supported by auto_gptq
Hello, MPT would be great!
MPT is using AliBi, while Falcon is offering multi query in the 40B model. Both are having strength.
MPT support would be great, yeah. There was a PR for that but apparently there's a problem in the base MPT repo which meant it couldn't work. I don't know if that's still the case or not.
We already have FalconLM support. Unfortunately it's really slow at the moment - like less than 1 token/s on FalconLM 40B, even on an H100 80GB. We're not sure why. But it does work.
Here's a few Falcon GPTQ 4bit models you can try:
- https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ
- https://huggingface.co/TheBloke/falcon-7b-instruct-GPTQ
- https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GPTQ
- https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-7B-GPTQ
I also did a couple in 3bit, so they can load on a 24GB GPU:
- https://huggingface.co/TheBloke/falcon-40b-instruct-3bit-GPTQ
- https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-3bit-GPTQ