llama-cpp-python when depoly the llava-cpp-pyton server in k8s as a service , it can only answer questions about the first image

when depoly the llava-cpp-pyton server in k8s as a service , it can only answer questions about the first image

Open adogwangwang opened this issue 9 months ago • 0 comments

@abetlen Hello, when I use python -m llama_cpp.server deployed a llava13b service on the Kubernetes platform, I noticed an issue where only the first image could be correctly returned. When I switched to another image, the responses became completely confused. The only solution was to restart the service and resubmit the query. I suspect that the VRAM is not being released properly, and subsequent images are not being parsed correctly but rather seem to merge with previous images before being fed into the model, leading to highly inaccurate responses. This problem seems to still be related to VRAM. How should I resolve this issue?

May 15 '24 07:05 adogwangwang

llama-cpp-python llama-cpp-python copied to clipboard

when depoly the llava-cpp-pyton server in k8s as a service , it can only answer questions about the first image

llama-cpp-python
llama-cpp-python copied to clipboard