FasterTransformer
FasterTransformer copied to clipboard
llama support inference?
May I ask when FastertTransformer can support llama's C++inference?
Based on FasterTransformer, we have implemented an efficient inference engine - TurboMind, supporting both llama and llama-2
FasterTransformer development has transitioned to TensorRT-LLM. TensorRT-LLM has supported LLaMa. Please take a try.