AJ

Results 3 comments of AJ

Can you also please look into this issue posted here: https://github.com/NVIDIA/Megatron-LM/issues/1462#issuecomment-2732642584, as part of this?

Any update on this? Running into the same error now with the following setup: Megatron-LM@7ee599a NeMo@633cb60 TransformerEngine@ab4fd3c and configs: ``` model.use_te_rng_tracker: True model.enable_cuda_graph: True ```

I see a similar doubling when CUDA graphs are turned ON/OFF. Please see this for ref: https://github.com/NVIDIA/Megatron-LM/issues/1462#issuecomment-2732642584. However, the above fix doesn't work with the graphs doubling.