server Sending two "load" requests to server makes it load twice

Sending two "load" requests to server makes it load twice

Open ShuaiShao93 opened this issue 1 year ago • 2 comments

Description When I use two clients to send /v2/repository/models/MODEL/load requests to the same server at the same time, the model is loaded twice

Triton Information What version of Triton are you using? 23.11

Are you using the Triton container or did you build it yourself? Container nvcr.io/nvidia/tritonserver:23.11-py3

To Reproduce Start a server in explicit mode, and load no model.

Open two terminals, run curl -X POST "http://localhost:8000/v2/repository/models/MODEL/load" -d "{}" at the same time. You can see logs like

 successfully loaded MODEL
loading: MODEL
successfully loaded MODEL
successfully unloaded MODEL

Expected behavior The model should be only loaded once. And the log successfully unloaded MODEL should be before successfully loaded MODEL

Mar 21 '24 18:03 ShuaiShao93

Hi @ShuaiShao93 , thanks a lot for reaching out. Can you provide with the following details

What type of model/backend?
Can you reproduce this behavior with other types of models/backends? Or is it specific to this one?
Not sure how are you getting the unloaded log? Are you making a unload request?

I am unable to reproduce this

When I try to load a model simultaneously it just gets loaded once.

Mar 25 '24 22:03 indrajit96

Hi @ShuaiShao93 , thanks a lot for reaching out. Can you provide with the following details

What type of model/backend?

Ensemble pipeline with Python & ONNX backends

Can you reproduce this behavior with other types of models/backends? Or is it specific to this one?

Sorry didn't get a chance to test more

Not sure how are you getting the unloaded log? Are you making a unload request?

No, I just made load requests simultaneously from two clients, and I saw the unloaded logs

I am unable to reproduce this

When I try to load a model simultaneously it just gets loaded once.

Apr 08 '24 03:04 ShuaiShao93

@ShuaiShao93 I guess this is an expected behavior in the case of explicit control. If you want to validate if that particular model is loaded before sending the load request, you can always hit the /index endpoint to get the loaded models list.

Jul 09 '24 18:07 sourabh-burnwal

server server copied to clipboard

Sending two "load" requests to server makes it load twice

server
server copied to clipboard