text-generation-inference
text-generation-inference copied to clipboard
Adds TensorRT-LLM backend to TGI
This PR aims at adding a new custom backends to TGI, namely Nvidia TensorRT-LLM.
The underlying implementation is done through the use of a Rust <-> C++ automatically generated binding living in include/ffi.h
and src/ffi.cpp
which is then exposed to Rust through externaly defined component in src/lib.rs
.
This initial version is in a "working state" but definitely not as clean as it should be, especially with the usage of many potential unecessary Arc<T>
along with an (please gods, forgive me ...) Box::leak
to avoid the asynchronous context to be dropped while still iterating...
These most urging concerns will be fixed ASAP in a follow up PR but I don't want to hold and get a more and more headcache prone rebase.