djl-serving
djl-serving copied to clipboard
How to use inference of multiple models
How to use inference of multiple models, such as PaddleOCR, which has three models, namely det, cls, and rec.
There are two ways to solve your problem, see the following demo:
- User DJLServing workflow: https://github.com/deepjavalibrary/djl-demo/tree/master/djl-serving/workflows/multi-model
- Use a custom Translator and load smaller model in the Translator
Which of these two methods performs better? I want to try it.
@faquir-sun It's really depends on how heavy your smaller models are. For option 1, each model can scale up to X number of workers independently, I can leverage more CPU power concurrently. However, the communication between each workflow may cause more overhead.
For option 2, everything runs in the same thread, and there is not serialization/deserialization between models. But only one worker is used for small model. This may become bottleneck.
For option 2, is there any example for reference?
@faquir-sun Here is an example that involves two models: https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/clip/ImageTextComparison.java
Above example treat two model equally, and use an utility function to run the inference. You can wrap one model into a custom translator if you want to.