truss
truss copied to clipboard
Throw 500s when Triton encounters exceptions
:rocket: What
This PR updates the Triton / TRT-LLM template to throw 500s when it encounters an exception. This only applies in the non-streaming usecase.
:computer: How
We throw a FastAPI HTTPException when encountering a Triton InferenceServerException. Thanks to the change in this PR, Truss will automatically propagate the exception to the underlying FastAPI server and pass more granular response types / status codes to the client.
:microscope: Testing
I reproduced this code in an individual truss and confirmed that when stream is set to False, exceptions return responses with a 500 status code and appropriate message.