DB-GPT
DB-GPT copied to clipboard
[Bug] llmserver.py 执行至Loading checkpoint shards: 100% 抛错
Search before asking
- [X] I had searched in the issues and found no similar issues.
Operating system information
Linux
Python version information
3.10
DB-GPT version
main
Related scenes
- [X] Chat Data
- [ ] Chat Excel
- [ ] Chat DB
- [ ] Chat Knowledge
- [ ] Model Management
- [ ] Dashboard
- [ ] Plugins
Installation Information
-
[ ] AutoDL Image
-
[ ] Other
Device information
T4 显卡 显卡数量:1 显存:15
Models information
vicuna-13b-v1.5("load_in_4bit": true) text2vec-large-chinese
What happened
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [02:30<00:00, 50.26s/it]
/home/miniconda3/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: do_sample
is set to False
. However, temperature
is set to 0.9
-- this flag is only used in sample-based generation modes. You should set do_sample=True
or unset temperature
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/home/miniconda3/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: do_sample
is set to False
. However, top_p
is set to 0.6
-- this flag is only used in sample-based generation modes. You should set do_sample=True
or unset top_p
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
2023-10-13 08:57:01 k161ae pilot.model.loader[4436] INFO Current model is type of: LlamaForCausalLM, load tokenizer by LlamaTokenizer
2023-10-13 08:57:01 k161ae pilot.model.cluster.worker.manager[4436] ERROR Error starting worker manager: expected str, bytes or os.PathLike object, not NoneType
2023-10-13 08:57:01 k161ae asyncio[4436] ERROR Task exception was never retrieved
future: <Task finished name='Task-3' coro=<_setup_fastapi.
What you expected to happen
1、T4 显卡 显存不足?
How to reproduce
执行:python /home/DB-GPT/pilot/server/llmserver.py
Additional context
No response
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
@gantao21 你好,建议先试试 7B 的模型,不开量化或者开8bit量化,另外拉一下最新 main 分支代码,然后根据 最新文档 来部署。
如果还有什么问题可以执行命令 dbgpt trace chat
导出一下相关信息,方便一起排查。
请问这个问题解决了吗,V100卡运行的时候也遇到了同样的问题
请问这个问题解决了吗,V100卡运行的时候也遇到了同样的问题
显存不足的问题,根据我上面的说明试试。
你好,建议先试试 7B 的模型,不开量化或者开8bit量化,另外拉一下最新 main 分支代码,然后根据 最新文档 来部署。
如果还有什么问题可以执行命令
dbgpt trace chat
导出一下相关信息,方便一起排查。
用vicuna-7b 的可以,百川7B 的不行。
好的,我试一下其他模型 比较疑惑的是:V100卡的显存有32G,模型启动的时候只占用了40%,我现在部署的是0.4.0版本,0.3.4版本同样的机器上也能跑13b-v1.5
好的,我试一下其他模型 比较疑惑的是:V100卡的显存有32G,模型启动的时候只占用了40%,我现在部署的是0.4.0版本,0.3.4版本同样的机器上也能跑13b-v1.5
感觉不太应该,可以发一下具体信息么,我看看 dbgpt trace chat
导出的信息。
+------------------------+------------------------------+-----------------------------+-----------------------------------------------------------------------------+ | Config Key (Webserver) | Config Value (Webserver) | Config Key (EmbeddingModel) | Config Value (EmbeddingModel) | +------------------------+------------------------------+-----------------------------+-----------------------------------------------------------------------------+ | host | 0.0.0.0 | model_name | text2vec | | port | 5000 | model_path | DBGPT_v0.4.0/DB-GPT-0.4.0/models/text2vec-large-chinese | | daemon | False | device | cuda | | controller_addr | None | normalize_embeddings | None | | model_name | None | | | | share | False | | | | remote_embedding | False | | | | log_level | None | | | | light | False | | | | log_file | dbgpt_webserver.log | | | | tracer_file | dbgpt_webserver_tracer.jsonl | | | +------------------------+------------------------------+-----------------------------+-----------------------------------------------------------------------------+ +--------------------------+----------------------------------------------------------------------+----------------------------+----------------------------------------------------------------------+ | Config Key (ModelWorker) | Config Value (ModelWorker) | Config Key (WorkerManager) | Config Value (WorkerManager) | +--------------------------+----------------------------------------------------------------------+----------------------------+----------------------------------------------------------------------+ | model_name | vicuna-13b-v1.5 | model_name | vicuna-13b-v1.5 | | model_path | /DB-GPT-0.4.0/models/vicuna-13b-v1.5 | model_path | /DBGPT_v0.4.0/DB-GPT-0.4.0/models/vicuna-13b-v1.5 | | device | cuda | worker_type | None | | model_type | huggingface | worker_class | None | | prompt_template | None | model_type | huggingface | | max_context_size | 4096 | host | 0.0.0.0 | | num_gpus | None | port | 5000 | | max_gpu_memory | None | daemon | False | | cpu_offloading | False | limit_model_concurrency | 5 | | load_8bit | True | standalone | True | | load_4bit | False | register | True | | quant_type | nf4 | worker_register_host | None | | use_double_quant | True | controller_addr | http://127.0.0.1:5000 | | compute_dtype | None | send_heartbeat | True | | trust_remote_code | True | heartbeat_interval | 20 | | verbose | False | log_level | None | | | | log_file | dbgpt_model_worker_manager.log | | | | tracer_file | dbgpt_model_worker_manager_tracer.jsonl | +--------------------------+----------------------------------------------------------------------+----------------------------+----------------------------------------------------------------------+ +----------------------------------------------------------------------------------------------------+ | ModelWorker System information | +-------------------+--------------------------------------------------------------------------------+ | System Config Key | System Config Value | +-------------------+--------------------------------------------------------------------------------+ | platform | linux | | python_version | 3.10.13 | | cpu | Intel(R) Xeon(R) Gold 6278C CPU @ 2.60GHz | | cpu_avx | AVX512 | | memory | 263601180 kB | | torch_version | 2.0.1+cu117 | | device | cuda | | device_version | 11.7 | | device_count | 4 | | device_other | name, driver_version, memory.total [MiB], memory.free [MiB], memory.used [MiB] | | | Tesla V100S-PCIE-32GB, 495.29.05, 32510 MiB, 32506 MiB, 4 MiB | | | Tesla V100S-PCIE-32GB, 495.29.05, 32510 MiB, 32506 MiB, 4 MiB | | | Tesla V100S-PCIE-32GB, 495.29.05, 32510 MiB, 32506 MiB, 4 MiB | | | Tesla V100S-PCIE-32GB, 495.29.05, 32510 MiB, 32506 MiB, 4 MiB | | | | +-------------------+--------------------------------------------------------------------------------+
@lv-stupidboy 这个是可以正常启动 vicuna-13b-v1.5
,但是在一些场景使用中会报显存不足的异常么?我看你已经开了 8bit 量化了,32G 显存单个用户使用不应该出现爆显存的。
但是现在服务启动时和上面的朋友完全一致的现象,服务启动失败。 Exception: model vicuna-13b-v1.5@huggingface(xx.xx.xx.xx:7860) start failed, All connection attempts failed
但是现在服务启动时和上面的朋友完全一致的现象,服务启动失败。 Exception: model vicuna-13b-v1.5@huggingface(xx.xx.xx.xx:7860) start failed, All connection attempts failed
目前不是很好复现你的问题,我看你有四张卡,可以考虑全部都利用起来,试试把量化关了,然后强制设置每张卡的最大GPU使用试试。
QUANTIZE_8bit=False
QUANTIZE_4bit=False
MAX_GPU_MEMORY=8Gib
QUANTIZE_8bit=False
QUANTIZE_4bit=False
CUDA_VISIBLE_DEVICES=0,1,2,3
MAX_GPU_MEMORY=8Gib
@fangyinc 大佬,调整了配置还是一样的现象
envs/dbgpt040/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:362: UserWarning:
do_sampleis set to
False. However,
temperatureis set to
0.9-- this flag is only used in sample-based generation modes. You should set
do_sample=Trueor unset
temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( envs/dbgpt040/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:367: UserWarning:
do_sampleis set to
False. However,
top_pis set to
0.6-- this flag is only used in sample-based generation modes. You should set
do_sample=Trueor unset
top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should befixed.
warnings.warn(
ERROR [pilot.model.cluster.worker.manager] Error starting worker manager: model vicuna-13b-v1.5@huggingface(xx.xx.xx.xx:7860) start failed, All connection attempts failed
ERROR [asyncio] Task exception was never retrieved
future: <Task finished name='Task-3' coro=<_setup_fastapi.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 763, in start_worker_manager
sys.exit(1)
SystemExit: 1
INFO [pilot.model.cluster.worker.manager] Stop all workers
INFO [pilot.model.cluster.worker.manager] Apply req: None, apply_func: <function LocalWorkerManager._stop_all_worker.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "envs/dbgpt040/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 763, in start_worker_manager sys.exit(1) SystemExit: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "envs/dbgpt040/lib/python3.10/site-packages/starlette/routing.py", line 686, in lifespan await receive() File "envs/dbgpt040/lib/python3.10/site-packages/uvicorn/lifespan/on.py", line 137, in receive return await self.receive_queue.get() File "envs/dbgpt040/lib/python3.10/asyncio/queues.py", line 159, in get await getter asyncio.exceptions.CancelledError `
QUANTIZE_8bit=False QUANTIZE_4bit=False CUDA_VISIBLE_DEVICES=0,1,2,3 MAX_GPU_MEMORY=8Gib
@fangyinc 大佬,调整了配置还是一样的现象
envs/dbgpt040/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:362: UserWarning:
do_sampleis set to
False. However,
temperatureis set to
0.9-- this flag is only used in sample-based generation modes. You should set
do_sample=Trueor unset
temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( envs/dbgpt040/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:367: UserWarning:
do_sampleis set to
False. However,
top_pis set to
0.6-- this flag is only used in sample-based generation modes. You should set
do_sample=Trueor unset
top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should befixed. warnings.warn( ERROR [pilot.model.cluster.worker.manager] Error starting worker manager: model vicuna-13b-v1.5@huggingface(xx.xx.xx.xx:7860) start failed, All connection attempts failed ERROR [asyncio] Task exception was never retrieved future: <Task finished name='Task-3' coro=<_setup_fastapi..startup_event..start_worker_manager() done, defined at DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py:758> exception=SystemExit(1)> Traceback (most recent call last): File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 760, in start_worker_manager await worker_manager.start() File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 578, in start return await self.worker_manager.start() File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 116, in start raise Exception(out.message) Exception: model vicuna-13b-v1.5@huggingface(xx.xx.xx.xx:7860) start failed, All connection attempts failedDuring handling of the above exception, another exception occurred:
Traceback (most recent call last): File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 763, in start_worker_manager sys.exit(1) SystemExit: 1 INFO [pilot.model.cluster.worker.manager] Stop all workers INFO [pilot.model.cluster.worker.manager] Apply req: None, apply_func: <function LocalWorkerManager._stop_all_worker.._stop_worker at 0x7f93b03ffe20> INFO [pilot.model.cluster.worker.manager] Apply to all workers WARNI [pilot.model.cluster.worker.manager] Stop worker, ignored exception from deregister_func: All connection attempts failed INFO [pilot.utils.model_utils] Clear torch cache of device: cuda:0 INFO [pilot.utils.model_utils] Clear torch cache of device: cuda:1 INFO [pilot.utils.model_utils] Clear torch cache of device: cuda:2 INFO [pilot.utils.model_utils] Clear torch cache of device: cuda:3 WARNI [pilot.model.cluster.worker.manager] Stop worker, ignored exception from deregister_func: All connection attempts failed ERROR: Traceback (most recent call last): File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 760, in start_worker_manager await worker_manager.start() File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 578, in start return await self.worker_manager.start() File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 116, in start raise Exception(out.message) Exception: model vicuna-13b-v1.5@huggingface(xx.xx.xx.xx:7860) start failed, All connection attempts failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "envs/dbgpt040/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run File "DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 763, in start_worker_manager sys.exit(1) SystemExit: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "envs/dbgpt040/lib/python3.10/site-packages/starlette/routing.py", line 686, in lifespan await receive() File "envs/dbgpt040/lib/python3.10/site-packages/uvicorn/lifespan/on.py", line 137, in receive return await self.receive_queue.get() File "envs/dbgpt040/lib/python3.10/asyncio/queues.py", line 159, in get await getter asyncio.exceptions.CancelledError `
辛苦发一下其它的错误,在这些通用错误前面应该还有具体的错误原因。
服务启动打印的日志都在这里了,辛苦大佬看一下 `# python pilot/server/dbgpt_server.py --host xx.xx.xx.xx --port 7860
=========================== WebWerverParameters ===========================
host: xx.xx.xx.xx port: 7860 daemon: False controller_addr: None model_name: None share: False remote_embedding: False log_level: INFO light: False log_file: dbgpt_webserver.log tracer_file: dbgpt_webserver_tracer.jsonl
======================================================================
4e05d94b5799 (head)
heads:None
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
Generating DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/meta_data/alembic/versions/91c18e894c6e_dbgpt_ddl_upate.py ... done
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade 4e05d94b5799 -> 91c18e894c6e, dbgpt ddl upate
INFO [pilot.model.cluster.worker.embedding_worker] [EmbeddingsModelWorker] Parameters of device is None, use cuda
WARNI [sentence_transformers.SentenceTransformer] No sentence-transformers model found with name DBGPT_v0.4.0/DB-GPT-0.4.0/models/text2vec-large-chinese. Creating a new one with MEAN pooling.
Model Unified Deployment Mode!
INFO: Started server process [1872205]
INFO: Waiting for application startup.
INFO [pilot.model.cluster.worker.manager] Begin start all worker, apply_req: None
INFO [pilot.model.cluster.worker.manager] Apply req: None, apply_func: <function LocalWorkerManager._start_all_worker.
=========================== ModelParameters ===========================
model_name: vicuna-13b-v1.5 model_path: DBGPT_v0.4.0/DB-GPT-0.4.0/models/vicuna-13b-v1.5 device: cuda model_type: huggingface prompt_template: None max_context_size: 4096 num_gpus: None max_gpu_memory: 8Gib cpu_offloading: False load_8bit: False load_4bit: False quant_type: nf4 use_double_quant: True compute_dtype: None trust_remote_code: True verbose: False
======================================================================
INFO: Uvicorn running on http://xx.xx.xx.xx:7860 (Press CTRL+C to quit)
INFO [pilot.model.loader] There has max_gpu_memory from config: 8Gib
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:18<00:00, 6.27s/it]
/envs/dbgpt040/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: do_sample
is set to False
. However, temperature
is set to 0.9
-- this flag is only used in sample-based generation modes. You should set do_sample=True
or unset temperature
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/envs/dbgpt040/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: do_sample
is set to False
. However, top_p
is set to 0.6
-- this flag is only used in sample-based generation modes. You should set do_sample=True
or unset top_p
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should befixed.
warnings.warn(
ERROR [pilot.model.cluster.worker.manager] Error starting worker manager: model vicuna-13b-v1.5@huggingface(xx.xx.xx.xx:7860) start failed, All connection attempts failed
ERROR [asyncio] Task exception was never retrieved
future: <Task finished name='Task-3' coro=<_setup_fastapi.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/DBGPT_v0.4.0/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 763, in start_worker_manager
sys.exit(1)
SystemExit: 1
INFO [pilot.model.cluster.worker.manager] Stop all workers
INFO [pilot.model.cluster.worker.manager] Apply req: None, apply_func: <function LocalWorkerManager._stop_all_worker.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/envs/dbgpt040/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run File "/DB-GPT-0.4.0/pilot/model/cluster/worker/manager.py", line 763, in start_worker_manager sys.exit(1) SystemExit: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/envs/dbgpt040/lib/python3.10/site-packages/starlette/routing.py", line 686, in lifespan await receive() File "/envs/dbgpt040/lib/python3.10/site-packages/uvicorn/lifespan/on.py", line 137, in receive return await self.receive_queue.get() File "/envs/dbgpt040/lib/python3.10/asyncio/queues.py", line 159, in get await getter asyncio.exceptions.CancelledError `
@lv-stupidboy 你好,启动命令 python pilot/server/dbgpt_server.py --host xx.xx.xx.xx --port 7860
中的是xx.xx.xx.xx
是做了特殊处理么,我看你是单机模式启动的,你的场景这个参数应该可以不用填,默认是 0.0.0.0
,表示监听本机的所有ip地址。
@fangyinc 没有特殊处理, IP我在发出来之前给屏蔽掉了,加上ip和host,是因为在linux机器上启动服务,我需要通过ip:port的模式去访问web服务,用默认参数我在web端访问不到
@fangyinc 没有特殊处理, IP我在发出来之前给屏蔽掉了,加上ip和host,是因为在linux机器上启动服务,我需要通过ip:port的模式去访问web服务,用默认参数我在web端访问不到
所以不填 --host
参数是能正常启动的么?
这个问题应该是跟你的 --host 的ip地址有关系,统一部署的模式下,DB-GPT 在默认会启动多个组件,其中需要一个 http://127.0.0.1:port
地址来通信,这里是由于是你指定的 host 启动,导致http://127.0.0.1:port
无法在服务间正常通信。
理论上 --host 0.0.0.0
就是监听本机的所有地址了,服务启动后,你在浏览器使用 http://ip:port
肯定是能正常访问的(注意不要在浏览器使用 http://0.0.0.0:port
去访问)。
默认端口是7860吗,我不加参数启动之后没有报错信息,进程也在,但是7860端口没有处于监听状态,我在浏览器无法访问web
服务应该是正常启动了,但是web端没办法访问 可能是有防火墙或者端口开放的白名单限制了
@lv-stupidboy web无法访问的问题有解决吗?
尝试了几次之后服务启动正常了,现在的启动命令 python dbgpt_server.py --port 7860
This issue has been marked as stale
, because it has been over 30 days without any activity.
This issue bas been closed, because it has been marked as stale
and there has been no activity for over 7 days.