SnapFusion
SnapFusion copied to clipboard
Why is CA latency of Down-1/8 block so high in iPhone 14 Pro?
The pic above is the analysis for cross-attention (CA) and ResNet blocks from your paper.
Down-1/8 has 2 CA blocks and Up-1/8 has 3 CA blocks, and the blocks have the same shape and computation workload. However, Up-1/8 has lower latency than Down-1/8, which doesn't make sense.
Can you explain that? Thanks for your assistance