Parth Mannan
Parth Mannan
I am not sure if that is what happened here but I do see an nvFuser failure pop out in OOM errors. Might not directly be a nvFuser issue but...
Yea, it is likely the nvFuser stuff was just printed out because OOM happened during execution. I have seen that before.
That sounds pretty useful and should suffice the requirement. Would calling this transformation generate a full computation trace for every TensorProxy result with the required tensor information? And I am...
> Hi @parthmannan, could you also start a main PR? Added PR for main here - https://github.com/NVIDIA/Megatron-LM/pull/2282 Will resolve conflicts shortly.
/ok to test https://github.com/NVIDIA/Megatron-LM/pull/2054/commits/e2a32cb54edf51e9cb000b7e8ab4b55e58e7d846
/ok to test https://github.com/NVIDIA/Megatron-LM/pull/2054/commits/d53c323bcfb6f088de4ad919adeccf615737f75a
/ok to test https://github.com/NVIDIA/Megatron-LM/pull/2054/commits/70b9758cefdf016eb30559693f9dfc6ad4a8e246
/ok to test https://github.com/NVIDIA/Megatron-LM/pull/2054/commits/9387269bf9a641293cbccd68bbbe4f1db874453d
/ok to test https://github.com/NVIDIA/Megatron-LM/pull/2054/commits/2bde6c85a99988ad4c3a9d37e633908dfb3e8323
/ok to test https://github.com/NVIDIA/Megatron-LM/pull/2054/commits/604ddd207fddb3f53adffb3e1190d94789a6e8b3