text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

GPT-NeoX and Pythia support + GPTQ-for-GPT-NeoX branch

Open Digitous opened this issue 1 year ago • 4 comments

I am working on integrating GPT-NeoX and Pythia support into GPTQ-for-LLaMa, aiming to add 4-bit GPTQ quantization and inference capabilities. This would enable a NeoX20B to run on a single RTX3090, or Pythia12B on even lower-end hardware, using only VRAM.

I have uploaded two files, neox.py and neox2.py, which represent two different approaches I attempted. However, my limited understanding of NeoX's layers and intermediate experience with Python have hindered my progress.

I have spent hours on this, but I am stuck. If anyone has expertise in the NeoX architecture and layer structure, please offer assistance.

https://github.com/Digitous/GPTQ-for-GPT-NeoX

Digitous avatar Mar 15 '23 20:03 Digitous

It would be nice if this worked. I am personally interested in quantizing GALACTICA-30b as well, and I managed to generate a galactica-30b-4bit.pt file, but the outputs of this quantized model were gargabe.

https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/46

oobabooga avatar Mar 15 '23 22:03 oobabooga

It would be nice if this worked. I am personally interested in quantizing GALACTICA-30b as well, and I managed to generate a galactica-30b-4bit.pt file, but the outputs of this quantized model were gargabe.

qwopqwop200/GPTQ-for-LLaMa#46

Closing my repo; after some back and forth with what I tried in DMs, a fellow member of KAI Discord figured out a working implementation VIA https://github.com/0cc4m/GPTQ-for-LLaMa/tree/gptneox

I'm about to try it out soon as I download TogetherComputer's new NeoX20b instruct based chat model.

..also if it works as hoped the code is open to integrating if interested; 99% sure Occam would be all for it. Any other model integrations pop up I'll share.

Digitous avatar Mar 15 '23 22:03 Digitous

Did you manage to get this working? Im having an error while installing the cuda kernel om that branch

Wingie avatar Mar 22 '23 18:03 Wingie

I get nan error from it when generating. The kernel is fine unless you have an older GPU.. like pre-pascal

OSST works.

Ph0rk0z avatar Mar 24 '23 01:03 Ph0rk0z

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

github-actions[bot] avatar Apr 25 '23 23:04 github-actions[bot]