server
server copied to clipboard
Question : Clarification on Usage of passive Flag and Secondary Devices in Model Configurations
Hello,
I am currently exploring the usage of the passive flag and secondary devices for a given instance group in a model configuration file.
While I have reviewed the code in core/backend_model.cc and core/backend_model_instance.cc, I am still unclear on their actual application and how they can be effectively implemented.
I also noticed that the implementation of secondary devices appears to be specific to the NVDLA TensorRT backend alone at the moment. Additionally, I see that APIs for handling secondary devices, such as the one created by @tanmayv25 (PR #26), seem to align with this purpose.
Furthermore, passive instances are being registered in the background, as indicated in the following code snippet: Registering passive instances.
Could you please provide guidance on the following?
-
Any documentation on purpose for these secondary devices /passive flag ?
-
How can passive instances and secondary devices be utilized effectively for a given backend and model_instance?
-
Are there any examples or scenarios that demonstrate their practical use?
Any clarification on this would be highly appreciable . Looking forward to your insights.
Thank you.
Hello @tanmayv25 and team , could you please help with the query above? Any Guidance would greatly help me on this.
@tanmayv25 could you follow-up on this one?
Any documentation on purpose for these secondary devices /passive flag ?
Secondary devices and passive model instances are separate concepts.
Secondary device certainly is used to specify NVDLA device for TensorRT backend. Details here. The secondary device specification describes whether the model instance is processing request at nvdla or GPUs.
Whereas the passive model instances are the model instances which will not receive any requests from Triton Core. See here for details. A passive model instance will be loaded and initialized by Triton, but no inference requests will be sent to the instance. Passive instances are typically used by a custom backend that uses its own mechanisms to distribute work to the passive instances.
How can passive instances and secondary devices be utilized effectively for a given backend and model_instance?
They are separate concepts. If you are a running inference on a system with nvidia dla you will have model instance for the target device defined in model config.
If you want to drive the requests via your own backend to passive model instances, then you'd need passive model instance.
@tanmayv25 Thanks for info.
Passive instances are typically used by a custom backend that uses its own mechanisms to distribute work to the passive instances.
But still the usage of passive instance is not full clear on how we can drive the passive model instances on custom backend. Fo example , in the core code at here
RETURN_IF_ERROR(local_instance->GenerateWarmupData()); RETURN_IF_ERROR(model->Server()->GetRateLimiter()->RegisterModelInstance( local_instance.get(), rate_limiter_config)); RETURN_IF_ERROR(local_instance->SetBackendThread( kind, device_id, model->DeviceBlocking())); }
only when it is not passive instance we are actually creating the backend thread where ideally the model execution will happen. But if have to use a Passive instance on custom backend shall I assume that custom backend should take care of all this logic ( like creating own thread , sending to scheduler ) and send for execution?
If so , could you please help with any example available on how to utilise the passive instance on custom backend?
@rmccorm4 @tanmayv25 could u please provide insights on the above question?
@tanmayv25 @rmccorm4 could someone please help me with the answer?