Why is the result of context-parallel DotProductAttention influenced by the random seed?

Open LitPrice opened this issue 1 year ago • 0 comments

Hi! When I want to replace the regular attention calculation with context-parallel DotProductAttention, I find that the results of DotProductAttention are influenced by different random seeds, and the outputs are not completely aligned. How can I resolve this situation?

Jun 25 '24 02:06 LitPrice