text-generation-webui
text-generation-webui copied to clipboard
GPT-NeoX and Pythia support + GPTQ-for-GPT-NeoX branch
I am working on integrating GPT-NeoX and Pythia support into GPTQ-for-LLaMa, aiming to add 4-bit GPTQ quantization and inference capabilities. This would enable a NeoX20B to run on a single RTX3090, or Pythia12B on even lower-end hardware, using only VRAM.
I have uploaded two files, neox.py and neox2.py, which represent two different approaches I attempted. However, my limited understanding of NeoX's layers and intermediate experience with Python have hindered my progress.
I have spent hours on this, but I am stuck. If anyone has expertise in the NeoX architecture and layer structure, please offer assistance.
https://github.com/Digitous/GPTQ-for-GPT-NeoX
It would be nice if this worked. I am personally interested in quantizing GALACTICA-30b as well, and I managed to generate a galactica-30b-4bit.pt
file, but the outputs of this quantized model were gargabe.
https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/46
It would be nice if this worked. I am personally interested in quantizing GALACTICA-30b as well, and I managed to generate a
galactica-30b-4bit.pt
file, but the outputs of this quantized model were gargabe.
Closing my repo; after some back and forth with what I tried in DMs, a fellow member of KAI Discord figured out a working implementation VIA https://github.com/0cc4m/GPTQ-for-LLaMa/tree/gptneox
I'm about to try it out soon as I download TogetherComputer's new NeoX20b instruct based chat model.
..also if it works as hoped the code is open to integrating if interested; 99% sure Occam would be all for it. Any other model integrations pop up I'll share.
Did you manage to get this working? Im having an error while installing the cuda kernel om that branch
I get nan error from it when generating. The kernel is fine unless you have an older GPU.. like pre-pascal
OSST works.
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.