Benchmarks use .half() (FP16)?

Open djstrong opened this issue 5 years ago • 3 comments

Do torch versions in benchmark https://github.com/Tencent/TurboTransformers/blob/master/docs/bert.md use .half() (FP16)?

Aug 03 '20 08:08 djstrong

No, we use FP32.

Aug 04 '20 01:08 feifeibear

Using transformers, FP16 on GPU usually does not change the scores, but the inference is faster 3-4 times. I hope for FP16 benchmarks using turbotransformers.

Aug 04 '20 09:08 djstrong

Interesting. Feedbacks from our customers indicate our FP32 version is fast enough. We believe quantization on CPU is more intensive, therefore we currently have no plan for GPU FP16. We will do it later.

Aug 04 '20 09:08 feifeibear