FasterTransformer
FasterTransformer copied to clipboard
Speed in QAT mode=1 is the same as FP16
https://github.com/NVIDIA/FasterTransformer/blob/main/docs/vit_guide.md#int8-vs-fp16-speedup-on-vit model : vit_B_16 device: A100 bs: 32
we use quant_mode=ft1 and the speed is almost the same with FP16. So is there any update on this case?
Have you tried quant_mode=ft2? That should be faster and the speedup result listed in vit_guide.md is ft2
results.
Close this bug because it is inactivated. Feel free to re-open this bug if you still have any problem.