Why is CA latency of Down-1/8 block so high in iPhone 14 Pro?

Open YSunLIN opened this issue 2 years ago • 0 comments

The pic above is the analysis for cross-attention (CA) and ResNet blocks from your paper.

Down-1/8 has 2 CA blocks and Up-1/8 has 3 CA blocks, and the blocks have the same shape and computation workload. However, Up-1/8 has lower latency than Down-1/8, which doesn't make sense.

Can you explain that? Thanks for your assistance

Sep 14 '23 09:09 YSunLIN