Zip Zou

Results 24 comments of Zip Zou

Good job!! But I think it is better that all the codes are reformatted with `yapf` according to the `.style.cfg`. This could reduce the `diff` and keep the style consistent...

During each forward pass in HF’s `Trainer`, the global loss is divided by `accumulation_steps` to ensure that the accumulated gradients are averaged when `step()` called. Most of the time, the...

I understand what you mean now, and you are right that there is an unnecessary division to calculate the average loss. However, I think the current code still meets my...

Could you provide a minimized code to reproduce this problem? When outputting additional metrics, especially in tensorboard, there is a tag such as `train` or `eval` to distinguish between different...