inference icon indicating copy to clipboard operation
inference copied to clipboard

FEAT: Support TensorRT-LLM backend

Open aresnow1 opened this issue 1 year ago • 1 comments

Support TensorRT-LLM backend.

  • [ ] Implements TRTModel with generate method.
  • [ ] Expose launch_trt_model to client.
  • [ ] Doc and example

aresnow1 avatar Nov 14 '23 09:11 aresnow1

Python API of in-flight batching is needed for this PR, and TensorRT-LLM team says it will be implemented in next versions.

aresnow1 avatar Nov 20 '23 08:11 aresnow1