FasterTransformer
FasterTransformer copied to clipboard
fastertransformer speed slower than pytorch
I am runing on vit got unexpected result:
FP32 op time : 2464.206495285034 ms
FP32 torch time : 2419.6650743484497 ms
it's even slower than pytorch.....
It is a possible case because GEMM takes almost all time under FP32.
In such case, small noise of time of GEMM may affect the latency obviously. In your case, the relative difference of latency is about 2%, which may be a noise. For such cases, FT and pytorch should have similar latency.
We don't suggest using FP32 for transformer model because FP16 can bring lots of speedup without accuracy drop.
@byshiue I found I didn't search GEMM info which caused using default gemm. Does searching a best algo will boost time a little bit? Does this gemm info file can cross different PC with same GPU card model?
Sorry for delay reply. Searching best algo may improve the speed. It is case by case. In general, the gemm info file can used in different devices with same GPU.