FAN
FAN copied to clipboard
About the inference time
I am curious as to why the FAN-B-H network, despite having fewer parameters and computational costs than ViT-B, has an inference time that is four times longer. I tested the inference times of FAN-B-H and ViT-B, with the former taking 20.8 ms per 100 runs and the latter 4.8 ms. The training time for FAN-B-H is also significantly slower. Could this be because some computations in FAN-B-H are not parallelizable?