outlines
outlines copied to clipboard
Create the model Triton
This is a draft PR to validate the approach used, addresses issue #1551
This PR creates an Outlines model to integrate with Triton servers running a TensorRT-LLM engine. The model is made to integrate with such servers that rely on a ensemble that contains preprocessing and postprocessing such that the endpoint can receive simple http requests. For instance:
curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": ""}'
The request above is taken from the README of the tensorrtllm_backend repository
Is there a way we could use triton's Python client?