text-generation-webui
text-generation-webui copied to clipboard
ValueError: MPTForCausalLM does not support `device_map='auto'` yet.
Describe the bug
not sure if this is fixable in your code, but here it is:
python server.py --verbose --model-menu --trust-remote-code --load-in-8bit
INFO:Gradio HTTP request redirected to localhost :)
WARNING:trust_remote_code is enabled. This is dangerous.
bin /home/silvacarl/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so
INFO:Loading mosaicml_mpt-7b-instruct...
/home/silvacarl/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-instruct/attention.py:148: UserWarning: Using attn_impl: torch
. If your model does not use alibi
or prefix_lm
we recommend using attn_impl: flash
otherwise we recommend using attn_impl: triton
.
warnings.warn('Using attn_impl: torch
. If your model does not use alibi
or ' + 'prefix_lm
we recommend using attn_impl: flash
otherwise ' + 'we recommend using attn_impl: triton
.')
ValueError: MPTForCausalLM does not support device_map='auto'
yet.
if this is not fixable in your code, jsut close or delete this. I will also research this issue.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
python server.py --verbose --model-menu --trust-remote-code --load-in-8bit
INFO:Gradio HTTP request redirected to localhost :)
WARNING:trust_remote_code is enabled. This is dangerous.
bin /home/silvacarl/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so
INFO:Loading mosaicml_mpt-7b-instruct...
/home/silvacarl/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-instruct/attention.py:148: UserWarning: Using attn_impl: torch
. If your model does not use alibi
or prefix_lm
we recommend using attn_impl: flash
otherwise we recommend using attn_impl: triton
.
warnings.warn('Using attn_impl: torch
. If your model does not use alibi
or ' + 'prefix_lm
we recommend using attn_impl: flash
otherwise ' + 'we recommend using attn_impl: triton
.')
ValueError: MPTForCausalLM does not support device_map='auto'
yet.
Screenshot
python server.py --verbose --model-menu --trust-remote-code --load-in-8bit
INFO:Gradio HTTP request redirected to localhost :)
WARNING:trust_remote_code is enabled. This is dangerous.
bin /home/silvacarl/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so
INFO:Loading mosaicml_mpt-7b-instruct...
/home/silvacarl/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-instruct/attention.py:148: UserWarning: Using attn_impl: torch
. If your model does not use alibi
or prefix_lm
we recommend using attn_impl: flash
otherwise we recommend using attn_impl: triton
.
warnings.warn('Using attn_impl: torch
. If your model does not use alibi
or ' + 'prefix_lm
we recommend using attn_impl: flash
otherwise ' + 'we recommend using attn_impl: triton
.')
ValueError: MPTForCausalLM does not support device_map='auto'
yet.
Logs
python server.py --verbose --model-menu --trust-remote-code --load-in-8bit
INFO:Gradio HTTP request redirected to localhost :)
WARNING:trust_remote_code is enabled. This is dangerous.
bin /home/silvacarl/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so
INFO:Loading mosaicml_mpt-7b-instruct...
/home/silvacarl/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-instruct/attention.py:148: UserWarning: Using `attn_impl: torch`. If your model does not use `alibi` or `prefix_lm` we recommend using `attn_impl: flash` otherwise we recommend using `attn_impl: triton`.
warnings.warn('Using `attn_impl: torch`. If your model does not use `alibi` or ' + '`prefix_lm` we recommend using `attn_impl: flash` otherwise ' + 'we recommend using `attn_impl: triton`.')
ValueError: MPTForCausalLM does not support `device_map='auto'` yet.
System Info
WSl Ubuntu 20.04
If I have neither --auto-devices nor --load-in-8bit on the command line, it'll load until it runs out of VRAM on my first 12 GB GPU. Looks like it needs around 14 GB. Either/both of those options - it'll throw the device_map=auto error.
I don't see a way to split it across GPUs at this point, and would like one.
thx checking that out
I fixed the issue in this way: https://github.com/oobabooga/text-generation-webui/issues/1828#issuecomment-1538881613 Works, but need more work to make a merge.
super cool, will check it out when merged.
just fyi, we are benchmarking these:
SinanAkkoyun/oasst-sft-7-llama-30b databricks/dolly-v2-12b Aeala/GPT4-x-AlpacaDente2-30b NousResearch/gpt4-x-vicuna-13b LLMs/Stable-Vicuna-13B nomic-ai/gpt4all-13b-snoozy togethercomputer/GPT-NeoXT-Chat-Base-20B mosaicml/mpt-7b-instruct mosaicml/mpt-7b-chat TheBloke/koala-13B-HF EleutherAI/pythia-12b mosaicml/mpt-1b-redpajama-200b-dolly stabilityai/stablelm-tuned-alpha-7b TheBloke/wizardLM-7B-HF samwit/koala-7b couchpotato888/alpaca13b OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 THUDM/chatglm-6b stabilityai/stablelm-tuned-alpha-7b TheBloke/wizard-vicuna-13B-HF chaoyi-wu/PMC_LLAMA_7B TheBloke/stable-vicuna-13B-HF
using your API to determine accuracy and resources needed to run as well as response times.
ok, so this is new:
python server.py --verbose --model-menu --trust-remote-code --load-in-8bit
INFO:Gradio HTTP request redirected to localhost :)
WARNING:trust_remote_code is enabled. This is dangerous.
bin /home/silvacarl/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so
INFO:Loading TheBloke_stable-vicuna-13B-HF...
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /mnt/d/text-generation-webui/server.py:885 in load_in_8bit_fp32_cpu_offload=True
and pass a custom
device_map
to from_pretrained
. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
any ideas?
how can i disable auto-devices, like --auto-devices False?
Confirmed it now runs with --load-in-8bit
nice, checking it out!
we are benchmarking both instruct and chat for these:
SinanAkkoyun/oasst-sft-7-llama-30b databricks/dolly-v2-12b Aeala/GPT4-x-AlpacaDente2-30b NousResearch/gpt4-x-vicuna-13b LLMs/Stable-Vicuna-13B nomic-ai/gpt4all-13b-snoozy togethercomputer/GPT-NeoXT-Chat-Base-20B mosaicml/mpt-7b-instruct mosaicml/mpt-7b-chat TheBloke/koala-13B-HF EleutherAI/pythia-12b mosaicml/mpt-1b-redpajama-200b-dolly stabilityai/stablelm-tuned-alpha-7b TheBloke/wizardLM-7B-HF samwit/koala-7b couchpotato888/alpaca13b OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 THUDM/chatglm-6b stabilityai/stablelm-tuned-alpha-7b TheBloke/wizard-vicuna-13B-HF chaoyi-wu/PMC_LLAMA_7B TheBloke/stable-vicuna-13B-HF decapoda-research/llama-13b-hf togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
we can post results back if anyone is interested.
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.