QAnything
QAnything copied to clipboard
[BUG] win11 wsl2 运行MiniChat-2-3B不成功
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
显卡12G,采用下面命令:bash ./run.sh -c local -i 0 -b hf -m MiniChat-2-3B -t minichat。运行报错:
console输出:
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 14099 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | 启动 LLM 服务超时,自动检查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中是否存在Error...
qanything-container-local | /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 以获取更多信息。
fschat_model_worker_7801.log日志:
2024-04-09 17:04:57 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/MiniChat-2-3B', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='minichat', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-04-09 17:04:57 | INFO | model_worker | Loading the model ['MiniChat-2-3B'] on worker e44e3aad ...
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-04-09 17:04:58 | ERROR | stderr |
0%| | 0/1 [00:00<?, ?it/s]
期望行为 | Expected Behavior
No response
运行环境 | Environment
- OS: Windows 11 WSL2
- NVIDIA Driver:537.70
- CUDA:
- docker:Docker Desktop 4.28.0 (139021)
- docker-compose:
- NVIDIA GPU:RTX3060
- NVIDIA GPU Memory:12G
QAnything日志 | QAnything logs
console输出:
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 14099 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | 启动 LLM 服务超时,自动检查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中是否存在Error...
qanything-container-local | /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 以获取更多信息。
fschat_model_worker_7801.log日志:
2024-04-09 17:04:57 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/MiniChat-2-3B', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='minichat', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-04-09 17:04:57 | INFO | model_worker | Loading the model ['MiniChat-2-3B'] on worker e44e3aad ...
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-04-09 17:04:58 | ERROR | stderr |
0%| | 0/1 [00:00<?, ?it/s]
复现方法 | Steps To Reproduce
No response
备注 | Anything else?
No response
同配置遇到相同的问题,日志报错内容一样,有什么解决方法吗?
Linux环境中也是一样的报错:
0%| | 0/1 [00:00<?, ?it/s]1 10:58:03 | ERROR | stderr |
根据这个issue,可以通过增加内存或者swap解决这个问题。 https://github.com/oobabooga/text-generation-webui/issues/2509
感觉是联网下载什么,但是网不通。
- OS: Windows 11 WSL2
- NVIDIA Driver:537.70
- CUDA:
- docker:Docker version 25.0.2
- docker-compose:
- NVIDIA GPU:RTX4080
- NVIDIA GPU Memory:16G 这个配置,和题主日志报错内容一样,有什么解决方法吗?
+1不知道出现什么问题
内存不够被系统kill了,运行的时候可以检查一下内存