text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

GPT-J and Pygmalion-6b 4bit

Open mayaeary opened this issue 2 years ago • 6 comments

Support 4-bit GPTQ for GPT-J-6b and Pygmalion-6b.

You need my fork of GPTQ-for-LLaMA for it to work. It forked from commit 468c47c01b4fe370616747b6d69a2d3f48bab5e4, so should be compatible with current version.

mkdir repositories
cd repositories
git clone https://github.com/mayaeary/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git checkout gptj
python setup_cuda.py install

To quantize model:

# from repositories/GPTQ-for-LLaMA
CUDA_VISIBLE_DEVICES=0 python gptj.py ../../models/pygmalion-6b_dev c4 --wbits 4 --save ../../models/pygmalion-6b_dev-4bit.pt

It seems to work, but can someone else test it?

UPD. https://huggingface.co/mayaeary/pygmalion-6b-4bit/resolve/main/pygmalion-6b_dev-4bit.pt - quantized checkpoint for pygmalion-6b_dev

mayaeary avatar Mar 23 '23 20:03 mayaeary

Do you have the compiled wheel for quant_cuda-0.0.0-cp310-cp310-win_amd64 quant kernel?

I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems to be the only way on Windows

Brawlence avatar Mar 23 '23 20:03 Brawlence

Do you have the compiled wheel for quant_cuda-0.0.0-cp310-cp310-win_amd64 quant kernel?

I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems to be the only way on Windows

You can install Visual Studio Build Tools, it's only compiler and libraries, without IDE.

I attached this file, but don't know if it'll work. quant_cuda-0.0.0-py3.10-win-amd64.egg.zip

mayaeary avatar Mar 23 '23 20:03 mayaeary

Whoa. Something wild is going on with gptj.py. It asked for a HF token (which I provided) and then it failed to quantize. BUT thanks to your egg file and the generous soul at https://huggingface.co/OccamRazor/pygmalion-6b-gptq-4bit/tree/main it actually worked.

Pyg-6b-q4 takes a little shy of ~~7 GBs~~ 4.5 GBs in memory, as it presumably should.

The file structure I used is the classic one, mimicking the one for LLaMA: 📂pygmalion-6b-gptq: ┣━━ 📄config.json ┣━━ 📄merges.txt ┣━━ 📄README.md ┣━━ 📄special_tokens_map.json ┣━━ 📄tokenizer_config.json ┣━━ 📄vocab.json ┗━━ 📄added_tokens.json 📄pygmalion-6b-gptq-4bit.pt

Side question: does it matter if I use pygmalion-6b-gptq-4bit and not pygmalion-6b_dev-4bit? It works and as far as I can tell, correctly.

Brawlence avatar Mar 23 '23 21:03 Brawlence

It asked for a HF token (which I provided) and then it failed to quantize.

c4 dataset requires huggingface authorization, you can use wikitext2 or ptb instead. I'm not sure what the difference, but used c4 as in original gptq repo.

generous soul at https://huggingface.co/OccamRazor/pygmalion-6b-gptq-4bit/tree/main

Somehow .bin file is less in size then my (I've uploaded it on huggingface too, see head message).

does it matter if I use pygmalion-6b-gptq-4bit and not pygmalion-6b_dev-4bit? It works and as far as I can tell, correctly

But it shouldn't, webui expects 4bit model to be called exactly as your main model folder + -4bit.pt. Are you sure it loaded? On my GPU it takes 4.5 GB of VRAM, 8bit version takes 7.5 and full 16 bit doesn't fit at all

mayaeary avatar Mar 23 '23 21:03 mayaeary

I'm pretty sure it works. Let me re-bench.

TEST

state VRAM
idle VRAM load 1.2 GB
model is loaded 4.9 GB
generation is triggered 5.7 GB

Yep, totally works. And you're correct, it's ~ 4.5 Gb as of now, don't know why total VRAM was peaking at 7 Gbs the last time

Brawlence avatar Mar 23 '23 21:03 Brawlence

There is now GPT-neoX, GPT-J, 4 bit loras and gpt-neo all with different kernels :(

also GPT-J with offload (https://github.com/AlpinDale/gptq-gptj/commits/main)

Ph0rk0z avatar Mar 23 '23 21:03 Ph0rk0z

Can anybody confirm if the Pygmalion-6b-4bit model works with the latest GPTQ repo and this one?

8WSR0hX avatar Mar 27 '23 21:03 8WSR0hX

It's for the old GPTQ.. but it does work. (in the old GPTQ)

Ph0rk0z avatar Mar 27 '23 22:03 Ph0rk0z

It's possible to use Pygmalion-6b-4bit with --gptq-pre-layer option?

treshphilip avatar Mar 28 '23 09:03 treshphilip

https://github.com/oobabooga/text-generation-webui/pull/615 - new version, this PR is outdated for now

mayaeary avatar Mar 28 '23 17:03 mayaeary