rk37
Results
1
issues of
rk37
there is no causal mask in the attention layer. Is it because the model is designed for classification rather than generation?