quantizing a 4096 context length model leads to corrupted output

Open chris-aeviator opened this issue 2 years ago • 1 comments

when converting and quantizing CarperAI/pythia-2.8b-deduped-4k (I've added it to Pythia Dict inside config.py with the only change being block_size=4096 ) I'm getting nonsensical output. I'd be happy to contribute a PR for a 4k context model if we can get this working.

Running on A10G.

` python generate/base.py --checkpoint_dir checkpoints/Pythia/pythia-2.8b-deduped-4k/ --quantize gptq.int4 --precision bf16-true --prompt "Hello, my name is" Loading model 'checkpoints/Pythia/pythia-2.8b-deduped-4k/lit_model_gptq.4bit.pth' with {'block_size': 4096, 'vocab_size': 50254, 'padding_multiple': 128, 'padded_vocab_size': 50304, 'n_layer': 32, 'n_head': 32, 'n_embd': 2560, 'rotary_percentage': 0.25, 'parallel_residual': True, 'bias': True, 'n_query_groups': 32, 'shared_attention_norm': False} bin /work/lit-parrot/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda117.so Time to instantiate model: 0.73 seconds. Time to load the model weights: 0.53 seconds. Global seed set to 1234

Hello, my name is4: L"!!'lo¡'s[]'ch,/ Iicer!ionuityV''as.'irdiDRR'editVutherResCDiPE,ahuhuspedichline. _sDR�LE `

Jun 15 '23 09:06 chris-aeviator

I didn't know about these extended Pythia models.

Does the generation look fine without quantization?

Jun 21 '23 16:06 carmocca