outlines icon indicating copy to clipboard operation
outlines copied to clipboard

Support for TensorRT-LLM

Open SupreethRao99 opened this issue 1 year ago • 7 comments

Outlines currently support the vLLM inference engine, it would be great if it could also support the tensorRT-LLM inference engine.

SupreethRao99 avatar Feb 11 '24 03:02 SupreethRao99

Yes waiting for this as welll

teis-e avatar Feb 13 '24 14:02 teis-e

TensorRT-LLM supports logits processors so it should be possible to integrate.

How are you hoping to use it? Are you seeking outlines.models.tensorrt, a serve endpoint, or both?

lapp0 avatar Feb 13 '24 16:02 lapp0

I'm looking at something similar to outlines.models.tensorrt right now as my usecase is mostly offline batched inference. Could you give me a starting point to how I can build this out, I'm eager to contribute and add such a feature.

SupreethRao99 avatar Feb 14 '24 03:02 SupreethRao99

@SupreethRao99 glad to hear you're interested in contributing.

I think a good starting point is looking into how TensorRT handles performs generation and handles LogitsProcessors

https://github.com/NVIDIA/TensorRT-LLM/blob/0ab9d17a59c284d2de36889832fe9fc7c8697604/tensorrt_llm/runtime/generation.py#L368-L386

https://github.com/NVIDIA/TensorRT-LLM/blob/0ab9d17a59c284d2de36889832fe9fc7c8697604/tensorrt_llm/runtime/model_runner_cpp.py#L266-L267

Then I'd review how llamacpp is being implemented, it shares similarities with how TensorRT would work https://github.com/outlines-dev/outlines/pull/556/ Specifically llamacpp.py https://github.com/dtiarks/outlines/blob/726ec242fb1695c5a67d489689be13ac84ef472c/outlines/models/llamacpp.py

Please let me know if you have any questions!

lapp0 avatar Feb 14 '24 03:02 lapp0

Thank you for the resources , I'll definitely get back to you with questions after going through these links.

Thanks!

SupreethRao99 avatar Feb 14 '24 04:02 SupreethRao99

Related to #655

rlouf avatar Feb 14 '24 08:02 rlouf

This can likely be implemented with the Executor API: https://github.com/NVIDIA/TensorRT-LLM/blob/31ac30e928a2db795799fdcab6be446bfa3a3998/examples/cpp/executor/README.md#L4

user-0a avatar Sep 16 '24 17:09 user-0a

Closing as duplicate

rlouf avatar Jun 25 '25 20:06 rlouf