v100 下Device does not support bf16
1:我执行命令 lmdeploy serve api_server /root/autodl-tmp/InternVL-Chat-V1-5 --server-port 8080
``
(internvl-deploy) root@autodl-container-b2b911ba00-2d4424e7:~# lmdeploy serve api_server /root/autodl-tmp/InternVL-Chat-V1-5 --server-port 8080
FlashAttention is not installed.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Device does not support bf16.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Exception in thread Thread-3:
Traceback (most recent call last):
File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 189, in _create_weight_func
model_comm.create_shared_weights(device_id, rank)
RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32
Exception in thread Thread-4: Traceback (most recent call last): File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 204, in _get_params out = model_comm.get_params(device_id, rank) RuntimeError: [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/triton_backend/llama/LlamaTritonModel.cc:418 `` 2:请问需要如何解决才能正常使用
这个是显存不够的问题,InternVL-Chat-V1-5需要至少2张V100,4张比较合适
- 试试修改--cache-max-entry-count 0.2 (默认是0.8)
lmdeploy serve api_server /mnt/share_model/InternVL-Chat-V1-5
--server-name 0.0.0.0
--server-port 23333
--tp 4
--cache-max-entry-count 0.2
--vision-max-batch-size 1
--max-batch-size 64
--quant-policy 0
I have similar error: bf16 is not supported on your device
This issue has been inactive for over two weeks. If the problem is still unresolved, please feel free to open a new issue to ask your question. Thank you.
Hello, you can use float16 instead of bfloat16.