Nick Sarkauskas
Nick Sarkauskas
Edit: email formatting did not work properly. Hi, I’m interested to see the PR as well. When I originally made my PR, the curves looked as expected—flat for mup and...
This behavior is expected--the weights are reinitialized using https://github.com/EleutherAI/gpt-neox/blob/43ea51c2f3aeef2fc642ba401ce08844eb5a0240/megatron/training.py#L446 Or do you mean this function does not get called?
@Sergei-Lebedev It seems like the same ucc test is failing as the active_set. However, this one fails because of wrong cuda versions: `nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.1, please update...
The ucc gtest failed on ``` [ RUN ] test_tl_mlx5_dm.MemcpyToDeviceMemory/4 [2024-05-16T18:13:52.173Z] [swx-clx01:196 :0:196] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil)) ```
On this PR, I logged into the CI node and opened a shell inside the container that was running this gtest. Running it myself it passed: ``` swx-jenkins@swx-clx01:/opt/nvidia/src/ucc/build/test/gtest$ stdbuf -e0...
I updated the PR. The get/reduce/put phase and the barrier part of the algorithm are now run via schedule. I left the allgather phase the way it was inside of...
Please wait to review, there are some failures I should fix first.
@samnordmann The PR is ready for review, thank you. Please note that the allgather task is still part of the algorithm. Once Ferrol's PR goes in I will convert the...
@Sergei-Lebedev @samnordmann The PR is ready to be merged
> @nsarka What is the use case for enabling this? If removing the restriction - why only limit to broadcast? This was requested by Almog from the cuBLAS team. I'm...