Results 3 comments of Rayleigh

hi @wenhui0924, if Attention is decoupled, is pre-training still mask-then-predict? Is pre-training equivalent to two modalities trained with two transformer models respectively?

Hi, @wenhui0924. To handle the two modalities, the released Beit3 model has two independent modules, A and B. There seems to be no self-attention between the two modules? Could you...

Hi, @greentfrapp thank you for your reply. In my opinion, the core of the reproduction in the PyTorch version is the consistency of the performance, not the svelte components. Regarding...