text-generation-inference Adds TensorRT-LLM backend to TGI

Adds TensorRT-LLM backend to TGI

Open mfuntowicz opened this issue 7 months ago • 0 comments

This PR aims at adding a new custom backends to TGI, namely Nvidia TensorRT-LLM.

The underlying implementation is done through the use of a Rust <-> C++ automatically generated binding living in include/ffi.h and src/ffi.cpp which is then exposed to Rust through externaly defined component in src/lib.rs.

This initial version is in a "working state" but definitely not as clean as it should be, especially with the usage of many potential unecessary Arc<T> along with an (please gods, forgive me ...) Box::leak to avoid the asynchronous context to be dropped while still iterating...

These most urging concerns will be fixed ASAP in a follow up PR but I don't want to hold and get a more and more headcache prone rebase.

Jul 17 '24 22:07 mfuntowicz

text-generation-inference text-generation-inference copied to clipboard

Adds TensorRT-LLM backend to TGI

text-generation-inference
text-generation-inference copied to clipboard