server Tritonserver may be load model multi times

Tritonserver may be load model multi times

Open vonchenplus opened this issue 4 months ago • 3 comments

Description Using tritonserver to delay loading(--model-control-mode=explicit) the llava-mixtral-8x7b model, there is a probability that when my client initiates load_model, it triggers the server to load the same model multiple times(There is a certain probability).

Triton Information nvcr.io/nvidia/tritonserver:23.08-py3

To Reproduce

Start tritonserver with --model-control-mode=explicit
start create grpc client and try to load_model(Multiple loading by multiple process).

use python backend, and load llava-maxtral-8x7b in initialize method.

Expected behavior Models should only be loaded once

The following is the error log triton_server.log

Mar 29 '24 11:03 vonchenplus

@vonchenplus would it be possible to confirm this with 24.03 release?

Apr 03 '24 04:04 nnshah1

@vonchenplus would it be possible to confirm this with 24.03 release?

Hello @nnshah1, Still have the same problem with 24.02.

The following is the error log triton_server.log

Apr 04 '24 09:04 vonchenplus

Thanks for the confirmation - will try to reproduce

Apr 05 '24 05:04 nnshah1

server server copied to clipboard

Tritonserver may be load model multi times

server
server copied to clipboard