server icon indicating copy to clipboard operation
server copied to clipboard

Enabling model loading from BLS models even when control mode is not explicit

Open MatthieuToulemont opened this issue 1 year ago • 1 comments

I am currently having issues with the start up time of Triton as I am loading my models from a synchronous client one model at a time.

A quick solution I found was to not use the explicit mode as Triton will then attempt to load all models from the repository.

One issue I run into is that my python (BLS) models are loading other models during the initialise step when deactivating the explicit mode.

I understand why, but I think this should not happen. Being able to load models from a BLS model is a great improvement that makes BLS model self-contained.

Hence I would like to know if it is possible to keep this behaviour when we deactivate the explicit control mode.

If it is not possible, how can I know in which mode Triton is running inside the BLS model ?

MatthieuToulemont avatar Feb 20 '24 16:02 MatthieuToulemont

how can I know in which mode Triton is running inside the BLS model ?

I don't think we expose model control mode to Python backend, or anywhere, @kthui will correct me if I'm wrong

I believe for any other mode, other than explicit, it is assumed that model are loaded. For BLS models to load child models themselves might be tricky as this will affect inference time. In addition, let's assume that on the start up you've filled all your memory, then there's no space left for child models, but BLS model is shown as loaded, which is confusing and not true.

oandreeva-nv avatar Feb 24 '24 01:02 oandreeva-nv

I believe for any other mode, other than explicit, it is assumed that model are loaded. For BLS models to load child models themselves might be tricky as this will affect inference time. In addition, let's assume that on the start up you've filled all your memory, then there's no space left for child models, but BLS model is shown as loaded, which is confusing and not true.

I am fine with not being able to load child models in BLS models when control mode is different from explicit, but I would need to know in which control modeI am in to disable the child model loading I use when using the explicit control mode.

I guess this could be done with an environment variable but I was wondering if there was a cleaner way to do it.

MatthieuToulemont avatar Feb 26 '24 07:02 MatthieuToulemont

Hi,

A quick solution I found was to not use the explicit mode as Triton will then attempt to load all models from the repository.

Does --model-control-mode=explicit --load-model=* help? It will load all models on start up while using explicit mode.

kthui avatar Feb 26 '24 23:02 kthui

That could work indeed !

MatthieuToulemont avatar Mar 19 '24 14:03 MatthieuToulemont