Chichi
Results
1
comments of
Chichi
He just wants to avoid very large logits that lead to the NaN problem. He believes it's the same in terms of optimization.