Ahmedd-Wahdan

Results 1 issues of Ahmedd-Wahdan

the input is (B,T) to the transformer and the output from the MLP is also (B,T) and we only use the embeddings of the last column to predict the next...