FasterTransformer
FasterTransformer copied to clipboard
What's the difference between FasterTransformer and TensorRT
Is FasterTransformer developed based on TensorRT? Is FasterTransformer more efficient than TensorRT when perfoming inference with Transformer models (e.g., llama)?
And what's the difference between FasterTransformer and Huggingface/betterTransformers?
https://github.com/NVIDIA/FasterTransformer/issues/211#issuecomment-1093495810
In my case, BetterTransformer from PyTorch is faster than FasterTransformer from NVIDIA (fp32, max len 512, Roberta large)
Given latest development, how does this repo fair against TensorRT-LLM?