inference FEAT: Support TensorRT-LLM backend

FEAT: Support TensorRT-LLM backend

Open aresnow1 opened this issue 1 year ago • 1 comments

Support TensorRT-LLM backend.

Nov 14 '23 09:11 aresnow1

Python API of in-flight batching is needed for this PR, and TensorRT-LLM team says it will be implemented in next versions.

Nov 20 '23 08:11 aresnow1