Zijian Hu
Results
1
issues of
Zijian Hu
Dear torchtitan team, I have a question regarding gradient norm clipping when using pipeline parallelism (PP) potentially combined with `FSDP/DP/TP`. For simplicity, let's assume each process/GPU has single PP stage....
bug