djl-serving How to use inference of multiple models

How to use inference of multiple models

Open polarisunny opened this issue 1 year ago • 5 comments

How to use inference of multiple models, such as PaddleOCR, which has three models, namely det, cls, and rec.

Aug 30 '23 04:08 polarisunny

There are two ways to solve your problem, see the following demo:

User DJLServing workflow: https://github.com/deepjavalibrary/djl-demo/tree/master/djl-serving/workflows/multi-model
Use a custom Translator and load smaller model in the Translator

Aug 30 '23 15:08 frankfliu

Which of these two methods performs better? I want to try it.

Aug 31 '23 02:08 polarisunny

@faquir-sun It's really depends on how heavy your smaller models are. For option 1, each model can scale up to X number of workers independently, I can leverage more CPU power concurrently. However, the communication between each workflow may cause more overhead.

For option 2, everything runs in the same thread, and there is not serialization/deserialization between models. But only one worker is used for small model. This may become bottleneck.

Aug 31 '23 19:08 frankfliu

For option 2, is there any example for reference?

Sep 07 '23 11:09 polarisunny

@faquir-sun Here is an example that involves two models: https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/clip/ImageTextComparison.java

Above example treat two model equally, and use an utility function to run the inference. You can wrap one model into a custom translator if you want to.

Sep 07 '23 16:09 frankfliu

djl-serving djl-serving copied to clipboard

How to use inference of multiple models

djl-serving
djl-serving copied to clipboard