Zijian Hu

Results 1 issues of Zijian Hu

Dear torchtitan team, I have a question regarding gradient norm clipping when using pipeline parallelism (PP) potentially combined with `FSDP/DP/TP`. For simplicity, let's assume each process/GPU has single PP stage....

bug