TurboTransformers icon indicating copy to clipboard operation
TurboTransformers copied to clipboard

Benchmarks use .half() (FP16)?

Open djstrong opened this issue 5 years ago • 3 comments

Do torch versions in benchmark https://github.com/Tencent/TurboTransformers/blob/master/docs/bert.md use .half() (FP16)?

djstrong avatar Aug 03 '20 08:08 djstrong

No, we use FP32.

feifeibear avatar Aug 04 '20 01:08 feifeibear

Using transformers, FP16 on GPU usually does not change the scores, but the inference is faster 3-4 times. I hope for FP16 benchmarks using turbotransformers.

djstrong avatar Aug 04 '20 09:08 djstrong

Interesting. Feedbacks from our customers indicate our FP32 version is fast enough. We believe quantization on CPU is more intensive, therefore we currently have no plan for GPU FP16. We will do it later.

feifeibear avatar Aug 04 '20 09:08 feifeibear