llm.c
llm.c copied to clipboard

Published 20 hours ago •

Reame
Issues

Enhance gradient norm calc in gpt2_update: reuse variables, clarify first pass logic, improve condition handling

Open bgorlick opened this issue 8 months ago • 0 comments

The gradient norm calculation is improved by:

Reusing variables (ShardInfo tensor and ShardInfo shard) to reduce redundancy and enhance readability.
Introducing is_first_pass flag to clearly determine the first loop iteration.
Refining condition handling slightly

Jun 18 '24 01:06 bgorlick