llm.c
llm.c copied to clipboard
Enhance gradient norm calc in gpt2_update: reuse variables, clarify first pass logic, improve condition handling
The gradient norm calculation is improved by:
- Reusing variables (ShardInfo tensor and ShardInfo shard) to reduce redundancy and enhance readability.
- Introducing is_first_pass flag to clearly determine the first loop iteration.
- Refining condition handling slightly