tingqli
Results
1
issues of
tingqli
The kv-projection in cross-attention is calculated in every decoding step which is redundant since encoder_outputs doesn't change during whole decoding phase, this PR add a simple caching mechanism in cross-attn...