tensorrt-laboratory icon indicating copy to clipboard operation
tensorrt-laboratory copied to clipboard

Model chaining example

Open SlipknotTN opened this issue 5 years ago • 5 comments

In README file is mentioned the possibility to do a model chaining call "Model Chaining: Model A -> Glue -> Model B -> ect
", but I didn't find an example in the repository.

Is there an available example? Some hints on how to do that?

I'd like to group more models in the same client call to save transfer time.

SlipknotTN avatar Mar 14 '19 09:03 SlipknotTN

I’ll whip up an example.

Help me understand your usecase more and I’ll see if I can get an example that help you get to where you want to go.

ryanolson avatar Mar 14 '19 19:03 ryanolson

Thank you, my use case is like this:

Client send image -> Model A TensorRT on Server -> Model B TensorRT on Server -> Custom code C++ on server -> results to the client.

Intermediate results are big in size, so I'd like to keep the processing on server end-to-end.

SlipknotTN avatar Mar 15 '19 09:03 SlipknotTN

The outputs of Model A are the inputs for Model B?

How about this for an example:

  • Decompose ResNet-152 into two TensorRT engines
    • Model A = base model which consists of the first 100-ish layers
    • Model B = customization model which consists of the remaining layers
    • Presumably you could have many customized models that all leverage the same base model.
    • The inference request will specify: base_model, customized_model
    • We will use the buffer reuse options of the CyclicAllocator and the ExecutionContext to minimize the memory footprint for the transaction
  • Provide the custom C++ post-processing lambda
    • Assume that the post-processing is large, so we'll provide some dedicated threads for "extra post-processing" outside the typical lifecycle.
    • We'll add a random 1-2ms of "post-processing"

ryanolson avatar Mar 16 '19 11:03 ryanolson

Sorry for the question, but I saw that TensorRT Inference Server permits to chain more than one model (actually the feature is in development) and it is possible to add custom C++ code as custom backend model. Which the relation between this project and TensorRT Inference Server? Is this a lower level version of TRTIS?

SlipknotTN avatar Mar 28 '19 16:03 SlipknotTN

Good question. NvRPC in TRTIS originated from this project. I hope someday the team pulls in the tensorrt runtime.

ryanolson avatar Apr 02 '19 01:04 ryanolson