About the inference time

Open yangzhou321 opened this issue 1 year ago • 0 comments

I am curious as to why the FAN-B-H network, despite having fewer parameters and computational costs than ViT-B, has an inference time that is four times longer. I tested the inference times of FAN-B-H and ViT-B, with the former taking 20.8 ms per 100 runs and the latter 4.8 ms. The training time for FAN-B-H is also significantly slower. Could this be because some computations in FAN-B-H are not parallelizable?

Nov 14 '24 02:11 yangzhou321