FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

fastertransformer speed slower than pytorch

Open lucasjinreal opened this issue 3 years ago • 2 comments

I am runing on vit got unexpected result:

FP32 op time :  2464.206495285034 ms
FP32 torch time :  2419.6650743484497 ms

it's even slower than pytorch.....

lucasjinreal avatar Sep 21 '22 07:09 lucasjinreal

It is a possible case because GEMM takes almost all time under FP32.

In such case, small noise of time of GEMM may affect the latency obviously. In your case, the relative difference of latency is about 2%, which may be a noise. For such cases, FT and pytorch should have similar latency.

We don't suggest using FP32 for transformer model because FP16 can bring lots of speedup without accuracy drop.

byshiue avatar Sep 22 '22 01:09 byshiue

@byshiue I found I didn't search GEMM info which caused using default gemm. Does searching a best algo will boost time a little bit? Does this gemm info file can cross different PC with same GPU card model?

lucasjinreal avatar Sep 22 '22 03:09 lucasjinreal

Sorry for delay reply. Searching best algo may improve the speed. It is case by case. In general, the gemm info file can used in different devices with same GPU.

byshiue avatar Dec 02 '22 14:12 byshiue