text-generation-webui GPT-J and Pygmalion-6b 4bit

Support 4-bit GPTQ for GPT-J-6b and Pygmalion-6b.

You need my fork of GPTQ-for-LLaMA for it to work. It forked from commit 468c47c01b4fe370616747b6d69a2d3f48bab5e4, so should be compatible with current version.

mkdir repositories
cd repositories
git clone https://github.com/mayaeary/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git checkout gptj
python setup_cuda.py install

To quantize model:

# from repositories/GPTQ-for-LLaMA
CUDA_VISIBLE_DEVICES=0 python gptj.py ../../models/pygmalion-6b_dev c4 --wbits 4 --save ../../models/pygmalion-6b_dev-4bit.pt

It seems to work, but can someone else test it?

UPD. https://huggingface.co/mayaeary/pygmalion-6b-4bit/resolve/main/pygmalion-6b_dev-4bit.pt - quantized checkpoint for pygmalion-6b_dev

Mar 23 '23 20:03 mayaeary

Do you have the compiled wheel for quant_cuda-0.0.0-cp310-cp310-win_amd64 quant kernel?

I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems to be the only way on Windows

Mar 23 '23 20:03 Brawlence

Do you have the compiled wheel for quant_cuda-0.0.0-cp310-cp310-win_amd64 quant kernel?

I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems to be the only way on Windows

You can install Visual Studio Build Tools, it's only compiler and libraries, without IDE.

I attached this file, but don't know if it'll work. quant_cuda-0.0.0-py3.10-win-amd64.egg.zip

Mar 23 '23 20:03 mayaeary

Whoa. Something wild is going on with gptj.py. It asked for a HF token (which I provided) and then it failed to quantize. BUT thanks to your egg file and the generous soul at https://huggingface.co/OccamRazor/pygmalion-6b-gptq-4bit/tree/main it actually worked.

Pyg-6b-q4 takes a little shy of ~~7 GBs~~ 4.5 GBs in memory, as it presumably should.

The file structure I used is the classic one, mimicking the one for LLaMA: 📂pygmalion-6b-gptq: ┣━━ 📄config.json ┣━━ 📄merges.txt ┣━━ 📄README.md ┣━━ 📄special_tokens_map.json ┣━━ 📄tokenizer_config.json ┣━━ 📄vocab.json ┗━━ 📄added_tokens.json 📄pygmalion-6b-gptq-4bit.pt

Side question: does it matter if I use pygmalion-6b-gptq-4bit and not pygmalion-6b_dev-4bit? It works and as far as I can tell, correctly.

Mar 23 '23 21:03 Brawlence

It asked for a HF token (which I provided) and then it failed to quantize.

c4 dataset requires huggingface authorization, you can use wikitext2 or ptb instead. I'm not sure what the difference, but used c4 as in original gptq repo.

generous soul at https://huggingface.co/OccamRazor/pygmalion-6b-gptq-4bit/tree/main

Somehow .bin file is less in size then my (I've uploaded it on huggingface too, see head message).

does it matter if I use pygmalion-6b-gptq-4bit and not pygmalion-6b_dev-4bit? It works and as far as I can tell, correctly

But it shouldn't, webui expects 4bit model to be called exactly as your main model folder + -4bit.pt. Are you sure it loaded? On my GPU it takes 4.5 GB of VRAM, 8bit version takes 7.5 and full 16 bit doesn't fit at all

Mar 23 '23 21:03 mayaeary

I'm pretty sure it works. Let me re-bench.

TEST

state	VRAM
idle VRAM load	1.2 GB
model is loaded	4.9 GB
generation is triggered	5.7 GB

Yep, totally works. And you're correct, it's ~ 4.5 Gb as of now, don't know why total VRAM was peaking at 7 Gbs the last time

Mar 23 '23 21:03 Brawlence

There is now GPT-neoX, GPT-J, 4 bit loras and gpt-neo all with different kernels :(

also GPT-J with offload (https://github.com/AlpinDale/gptq-gptj/commits/main)

Mar 23 '23 21:03 Ph0rk0z

Can anybody confirm if the Pygmalion-6b-4bit model works with the latest GPTQ repo and this one?

Mar 27 '23 21:03 8WSR0hX

It's for the old GPTQ.. but it does work. (in the old GPTQ)

Mar 27 '23 22:03 Ph0rk0z

It's possible to use Pygmalion-6b-4bit with --gptq-pre-layer option?

Mar 28 '23 09:03 treshphilip

https://github.com/oobabooga/text-generation-webui/pull/615 - new version, this PR is outdated for now

Mar 28 '23 17:03 mayaeary

text-generation-webui text-generation-webui copied to clipboard

GPT-J and Pygmalion-6b 4bit

text-generation-webui
text-generation-webui copied to clipboard