smoothquant
smoothquant copied to clipboard
How to use SmoothQuant in FasterTransformer?
I have build and run FasterTransformer. I see there is a parameter --int8_mode in FasterTransformer,. will it use SmoothQuant as default, if I set int8_mode =1?
if not is there any example of using SmoothQuant in FasterTransformer ?
thank you!
https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md