Cospui
Cospui
I have the same problem too.
I also encountered this problem. It seems to be caused by bad initialization and sampling or ReLU so the training is stuck in local optima. Changing the seed (e.g., uncomment...
Thanks for your reply. As for the attention bias, I use ``` attn_bias, x = xformers.ops.fmha.BlockDiagonalMask.from_tensor_list(x) attn_bias = attn_bias.make_causal() ``` I set ```model.to(device)``` in each rank, but I did not...