server
server copied to clipboard
Triton Keeps Restarting during LLM model loading until it suddenly works
Description I am deploying a 7b open-source LLM on a triton server with 32gi memory, 8 cpu and 2 T4 GPU. I also have some other code deployed as models(doesnt take much cpu/memory) in the same container. When I deploy the other models, pod works fine. But when I add the LLM, I notice that during the LLM model loading(from huggingface), the pod restarts suddenly. The model weights are copied to the container during pod initialization. It sometimes restarts 5-6 times until it starts working. But everytime after a few restarts the pod does end up working in the end without any issues.
Triton Information nvcr.io/nvidia/tritonserver:23.09-pyt-python-py3
Are you using the Triton container or did you build it yourself? Using the following container
To Reproduce
- Create a model which loads a 7b Huggingface LLM
- Try loading it in triton
- Pod restarts a few times before it eventually works.
Expected behavior Pod starts in the first attempt without any restarts.
I have not seen this issue before. Do you have access to the logs when the Pod restarts?
Yes I do have access but there isn't any logs. The logs showing the pod is loading the model and then the next log is pod restarting.
Had this exact issue with many other models: yolov8, custom BLS and yolov5. Still unable to find reproduce.