djl What's the solutions of concurrency for AI model inference in DJL?

What's the solutions of concurrency for AI model inference in DJL?

Open SidneyLann opened this issue 2 years ago • 1 comments

trafficstars

Description

What's the solutions of concurrency for AI model inference in DJL? Multithreads can access a model in the same time? Support Nvidia Triton?

Will this change the current api? How?

Who will benefit from this enhancement?

References

list reference and related literature
list known implementations

Nov 03 '23 21:11 SidneyLann

DJL is a low level library. We have DJLServing as a model server which is designed as a general inference platform. And we do support running tritoncore inside DJLServing. Please take a look: https://docs.djl.ai/master/docs/serving/index.html

Nov 05 '23 18:11 frankfliu

djl djl copied to clipboard

What's the solutions of concurrency for AI model inference in DJL?

Description

References

djl
djl copied to clipboard