fasil-saidalavi

Results 1 issues of fasil-saidalavi

I have trained a 1.3B model using both the Differential Transformer and the standard Transformer. I observed a slight improvement in LLM evaluation scores for the Differential Transformer variant, and...