server icon indicating copy to clipboard operation
server copied to clipboard

Tritonserver may be load model multi times

Open vonchenplus opened this issue 4 months ago • 3 comments

Description Using tritonserver to delay loading(--model-control-mode=explicit) the llava-mixtral-8x7b model, there is a probability that when my client initiates load_model, it triggers the server to load the same model multiple times(There is a certain probability).

Triton Information nvcr.io/nvidia/tritonserver:23.08-py3

To Reproduce

  1. Start tritonserver with --model-control-mode=explicit
  2. start create grpc client and try to load_model(Multiple loading by multiple process).

use python backend, and load llava-maxtral-8x7b in initialize method.

Expected behavior Models should only be loaded once

The following is the error log triton_server.log

vonchenplus avatar Mar 29 '24 11:03 vonchenplus

@vonchenplus would it be possible to confirm this with 24.03 release?

nnshah1 avatar Apr 03 '24 04:04 nnshah1

@vonchenplus would it be possible to confirm this with 24.03 release?

Hello @nnshah1, Still have the same problem with 24.02.

The following is the error log triton_server.log

vonchenplus avatar Apr 04 '24 09:04 vonchenplus

Thanks for the confirmation - will try to reproduce

nnshah1 avatar Apr 05 '24 05:04 nnshah1