djl
djl copied to clipboard
What's the solutions of concurrency for AI model inference in DJL?
trafficstars
Description
What's the solutions of concurrency for AI model inference in DJL? Multithreads can access a model in the same time? Support Nvidia Triton?
Will this change the current api? How?
Who will benefit from this enhancement?
References
- list reference and related literature
- list known implementations
DJL is a low level library. We have DJLServing as a model server which is designed as a general inference platform. And we do support running tritoncore inside DJLServing. Please take a look: https://docs.djl.ai/master/docs/serving/index.html