FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

llama support inference?

Open double-vin opened this issue 2 years ago • 2 comments

May I ask when FastertTransformer can support llama's C++inference?

double-vin avatar Jul 24 '23 09:07 double-vin

Based on FasterTransformer, we have implemented an efficient inference engine - TurboMind, supporting both llama and llama-2

lvhan028 avatar Jul 25 '23 04:07 lvhan028

FasterTransformer development has transitioned to TensorRT-LLM. TensorRT-LLM has supported LLaMa. Please take a try.

byshiue avatar Oct 20 '23 07:10 byshiue