rk37

Results 1 issues of rk37

there is no causal mask in the attention layer. Is it because the model is designed for classification rather than generation?