fasil-saidalavi
Results
1
issues of
fasil-saidalavi
I have trained a 1.3B model using both the Differential Transformer and the standard Transformer. I observed a slight improvement in LLM evaluation scores for the Differential Transformer variant, and...