InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

v100 下Device does not support bf16

Open starise-wg opened this issue 1 year ago • 3 comments

1:我执行命令 lmdeploy serve api_server /root/autodl-tmp/InternVL-Chat-V1-5 --server-port 8080

`` (internvl-deploy) root@autodl-container-b2b911ba00-2d4424e7:~# lmdeploy serve api_server /root/autodl-tmp/InternVL-Chat-V1-5 --server-port 8080 FlashAttention is not installed. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Device does not support bf16. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Exception in thread Thread-3: Traceback (most recent call last): File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 189, in _create_weight_func model_comm.create_shared_weights(device_id, rank) RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32

Exception in thread Thread-4: Traceback (most recent call last): File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/root/miniconda3/envs/internvl-deploy/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 204, in _get_params out = model_comm.get_params(device_id, rank) RuntimeError: [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/triton_backend/llama/LlamaTritonModel.cc:418 `` 2:请问需要如何解决才能正常使用

starise-wg avatar Jun 11 '24 09:06 starise-wg

这个是显存不够的问题,InternVL-Chat-V1-5需要至少2张V100,4张比较合适

  • 试试修改--cache-max-entry-count 0.2 (默认是0.8)
lmdeploy serve api_server /mnt/share_model/InternVL-Chat-V1-5 
 --server-name 0.0.0.0
 --server-port 23333 
 --tp 4 
 --cache-max-entry-count 0.2 
 --vision-max-batch-size 1 
 --max-batch-size 64 
 --quant-policy 0

BIGBALLON avatar Jun 12 '24 05:06 BIGBALLON

I have similar error: bf16 is not supported on your device

ElhamAhmedian avatar Aug 13 '24 10:08 ElhamAhmedian

This issue has been inactive for over two weeks. If the problem is still unresolved, please feel free to open a new issue to ask your question. Thank you.

zmyzxb avatar Aug 14 '24 07:08 zmyzxb

Hello, you can use float16 instead of bfloat16.

czczup avatar Aug 26 '24 05:08 czczup