quantizing a 4096 context length model leads to corrupted output
when converting and quantizing CarperAI/pythia-2.8b-deduped-4k (I've added it to Pythia Dict inside config.py with the only change being block_size=4096 ) I'm getting nonsensical output. I'd be happy to contribute a PR for a 4k context model if we can get this working.
Running on A10G.
` python generate/base.py --checkpoint_dir checkpoints/Pythia/pythia-2.8b-deduped-4k/ --quantize gptq.int4 --precision bf16-true --prompt "Hello, my name is" Loading model 'checkpoints/Pythia/pythia-2.8b-deduped-4k/lit_model_gptq.4bit.pth' with {'block_size': 4096, 'vocab_size': 50254, 'padding_multiple': 128, 'padded_vocab_size': 50304, 'n_layer': 32, 'n_head': 32, 'n_embd': 2560, 'rotary_percentage': 0.25, 'parallel_residual': True, 'bias': True, 'n_query_groups': 32, 'shared_attention_norm': False} bin /work/lit-parrot/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda117.so Time to instantiate model: 0.73 seconds. Time to load the model weights: 0.53 seconds. Global seed set to 1234
Hello, my name is4: L"!!'lo¡'s[]'ch,/ Iicer!ionuityV''as.'irdiDRR'editVutherResCDiPE,ahuhuspedichline. _sDR�LE `
I didn't know about these extended Pythia models.
Does the generation look fine without quantization?