Sana Inquiry about the Definition of "Training Gradient Norm" in SANA-Srint's Figure 3 Subplots

Inquiry about the Definition of "Training Gradient Norm" in SANA-Srint's Figure 3 Subplots

Open Hanbo-Cheng opened this issue 5 months ago • 0 comments

Hello, I would like to inquire about Figure 3 in SANA-Srint. In the two subplots, does the "training gradient norm" mentioned refer to the gradients of trainable parameters (\theta) during training or the (d_F / d_t )?

Because when calculating the loss, the normalization of d_f/d_t ( g = g / (||g|| + c)) has already been considered. Logically, an excessively large d_f/d_t should not significantly affect training stability. I'm not sure is that right?

May 31 '25 01:05 Hanbo-Cheng

Sana Sana copied to clipboard

Inquiry about the Definition of "Training Gradient Norm" in SANA-Srint's Figure 3 Subplots

Sana
Sana copied to clipboard