Ahmedd-Wahdan
Results
1
issues of
Ahmedd-Wahdan
the input is (B,T) to the transformer and the output from the MLP is also (B,T) and we only use the embeddings of the last column to predict the next...