text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Add MPT quantized model support

Open SilverJim opened this issue 2 years ago • 1 comments

I have test it on: Model: https://huggingface.co/4bit/mpt-7b-storywriter-4bit-128g GPTQ: https://github.com/qwopqwop200/GPTQ-for-LLaMa/commit/5731aa11de56affe6e8c88cea66a171045ad1dce

And it is usable using following command: python3 server.py --notebook --api --model 4bit_mpt-7b-storywriter-4bit-128g --trust-remote-code --wbits 4 --groupsize 128 --model_type mpt

SilverJim avatar May 11 '23 18:05 SilverJim

Can you check if this also works for moss 4-bit?

oobabooga avatar May 11 '23 18:05 oobabooga

https://github.com/oobabooga/text-generation-webui/blob/34970ea3af8f88c501e58fef2fc5c489c8df2743/modules/GPTQ_loader.py#L100

There is hardcoded sequence length in _load_quant. Does it work with context sizes over 2048? MPT-Storywriter should support up to 65k contexts.

mayaeary avatar May 15 '23 11:05 mayaeary

IMHO, important update. We need stuff with 65k context!

janvarev avatar May 21 '23 08:05 janvarev

I find that this model loads fine setting model_type to llama

python server.py --model 4bit_mpt-7b-storywriter-4bit-128g --trust-remote-code --model_type llama

@mayaeary if I increase seqlen to 36000 and increase "Truncate the prompt up to this length" to 8192 under "Parameters" the model does generate, but this is hacky and I have no idea if it is the right way to do it (what is even seqlen?).

oobabooga avatar May 24 '23 04:05 oobabooga

@oobabooga https://huggingface.co/4bit/mpt-7b-storywriter-4bit-128g/blob/main/config.json the model does define "max_seq_len": 65536, you could stick with that

yhyu13 avatar May 25 '23 03:05 yhyu13

I'll close this PR because

  1. MPT loads fine with model_type = llama
  2. MPT is not officially supported by gptq-for-llama, so defining a "mpt" model_type is undefined behavior

Soon it should be added in a more proper way to https://github.com/PanQiWei/AutoGPTQ, and it can already be loaded with --load-in-4bit starting from the 16-bit weights

oobabooga avatar May 31 '23 01:05 oobabooga