server Triton Keeps Restarting during LLM model loading until it suddenly works

Triton Keeps Restarting during LLM model loading until it suddenly works

Open jimmymanianchira opened this issue 1 year ago • 3 comments

Description I am deploying a 7b open-source LLM on a triton server with 32gi memory, 8 cpu and 2 T4 GPU. I also have some other code deployed as models(doesnt take much cpu/memory) in the same container. When I deploy the other models, pod works fine. But when I add the LLM, I notice that during the LLM model loading(from huggingface), the pod restarts suddenly. The model weights are copied to the container during pod initialization. It sometimes restarts 5-6 times until it starts working. But everytime after a few restarts the pod does end up working in the end without any issues.

Triton Information nvcr.io/nvidia/tritonserver:23.09-pyt-python-py3

Are you using the Triton container or did you build it yourself? Using the following container

To Reproduce

Create a model which loads a 7b Huggingface LLM
Try loading it in triton
Pod restarts a few times before it eventually works.

Expected behavior Pod starts in the first attempt without any restarts.

Jan 08 '24 16:01 jimmymanianchira

I have not seen this issue before. Do you have access to the logs when the Pod restarts?

Jan 10 '24 23:01 Tabrizian

Yes I do have access but there isn't any logs. The logs showing the pod is loading the model and then the next log is pod restarting.

Jan 11 '24 16:01 jimmymanianchira

Had this exact issue with many other models: yolov8, custom BLS and yolov5. Still unable to find reproduce.

Jan 17 '24 13:01 kbegiedza

server server copied to clipboard

Triton Keeps Restarting during LLM model loading until it suddenly works

server
server copied to clipboard