outlines icon indicating copy to clipboard operation
outlines copied to clipboard

Create the model Triton

Open RobinPicard opened this issue 7 months ago • 1 comments

This is a draft PR to validate the approach used, addresses issue #1551

This PR creates an Outlines model to integrate with Triton servers running a TensorRT-LLM engine. The model is made to integrate with such servers that rely on a ensemble that contains preprocessing and postprocessing such that the endpoint can receive simple http requests. For instance:

curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": ""}'

The request above is taken from the README of the tensorrtllm_backend repository

RobinPicard avatar May 09 '25 08:05 RobinPicard

Is there a way we could use triton's Python client?

rlouf avatar May 09 '25 10:05 rlouf