[Bug] llava, cuda out of memory
Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
Describe the bug
I hava one A100 gpu card. following the instruction (https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/vl_pipeline.md), run HelloWorld llava program,error occurs. The llava model is llava-v1.5-7b, and is not very big. So, why "cuda out of memory" error occurs?
error info:
Exception in thread Thread-139 (_create_weight_func):
Traceback (most recent call last):
File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
_threading_Thread_run(self)
File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 199, in _create_weight_func
model_comm.create_shared_weights(device_id, rank)
RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32
Exception in thread Thread-140 (_get_params):
Traceback (most recent call last):
File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
_threading_Thread_run(self)
File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 215, in _get_params
out = model_comm.get_params(device_id, rank)
RuntimeError: [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/triton_backend/llama/LlamaTritonModel.cc:417
Reproduction
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy import pipeline, ChatTemplateConfig
model_path='/media/star/8T/model/gpt/llava/llava-v1.5-7b'
image_path='/media/star/8T/tmp/gpt4v/1/1.png'
question="desc the image in detail "
pipe = pipeline(model_path)
image = load_image('/media/star/8T/tmp/gpt4v/1/1.png')
pipe = pipeline(model_path,
chat_template_config=ChatTemplateConfig(model_name='vicuna'))
image = load_image(image)
response = pipe((question, image))
print(response)
Environment
can't find lmdeploy check_env file
Error traceback
No response
The loading process of vlm is: load vision model -> load llm weight -> allocate kv cache.
For llava-v1.5-7b, the first two steps will takes up about 14.5G cuda memory. But according to your log, the out of memory occured in step 2. Is there any other programs taking up gpu memory?
The loading process of vlm is: load vision model -> load llm weight -> allocate kv cache.
For llava-v1.5-7b, the first two steps will takes up about 14.5G cuda memory. But according to your log, the
out of memoryoccured in step 2. Is there any other programs taking up gpu memory?
no,there is only one lmdeploy program running on the gpu.
Can you try run the code without jupyter or ipython.
Can you try run the code without jupyter or ipython.
I only have one card and it's currently running a program, so I can't test it right now. I'll test it next week and will share the results then.
did you solve this problem? i encounter the exactly the same promblem.
did you solve this problem? i encounter the exactly the same promblem.
I didn't solve this problem; I switched to a different architecture instead.
which architecture did you shift to?
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.