djl-serving icon indicating copy to clipboard operation
djl-serving copied to clipboard

How to use inference of multiple models

Open polarisunny opened this issue 1 year ago • 5 comments

How to use inference of multiple models, such as PaddleOCR, which has three models, namely det, cls, and rec. image

polarisunny avatar Aug 30 '23 04:08 polarisunny

There are two ways to solve your problem, see the following demo:

  1. User DJLServing workflow: https://github.com/deepjavalibrary/djl-demo/tree/master/djl-serving/workflows/multi-model
  2. Use a custom Translator and load smaller model in the Translator

frankfliu avatar Aug 30 '23 15:08 frankfliu

Which of these two methods performs better? I want to try it.

polarisunny avatar Aug 31 '23 02:08 polarisunny

@faquir-sun It's really depends on how heavy your smaller models are. For option 1, each model can scale up to X number of workers independently, I can leverage more CPU power concurrently. However, the communication between each workflow may cause more overhead.

For option 2, everything runs in the same thread, and there is not serialization/deserialization between models. But only one worker is used for small model. This may become bottleneck.

frankfliu avatar Aug 31 '23 19:08 frankfliu

For option 2, is there any example for reference?

polarisunny avatar Sep 07 '23 11:09 polarisunny

@faquir-sun Here is an example that involves two models: https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/clip/ImageTextComparison.java

Above example treat two model equally, and use an utility function to run the inference. You can wrap one model into a custom translator if you want to.

frankfliu avatar Sep 07 '23 16:09 frankfliu