FasterTransformer Speed in QAT mode=1 is the same as FP16

Speed in QAT mode=1 is the same as FP16

Open charlieguo0307 opened this issue 2 years ago • 1 comments

https://github.com/NVIDIA/FasterTransformer/blob/main/docs/vit_guide.md#int8-vs-fp16-speedup-on-vit model : vit_B_16 device: A100 bs: 32

we use quant_mode=ft1 and the speed is almost the same with FP16. So is there any update on this case?

Sep 23 '22 02:09 charlieguo0307

Have you tried quant_mode=ft2? That should be faster and the speedup result listed in vit_guide.md is ft2 results.

Sep 23 '22 03:09 Njuapp

Close this bug because it is inactivated. Feel free to re-open this bug if you still have any problem.

Dec 02 '22 14:12 byshiue