Chichi

Results 1 comments of Chichi

He just wants to avoid very large logits that lead to the NaN problem. He believes it's the same in terms of optimization.