wdykas comments

Results 20 comments of


                                            wdykas

[QUESTION] Why should CUDA_DEVICE_MAX_CONNECTIONS=1 should be set when using seq_parallel or async comm?

It enforces the order of kernel execution on GPU as the kernel queuing order from host. Its for GEMM and TP communication overlap it allows for scheduling the communication kernel...

[Bug]: ERROR 07-26 14:50:35 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 214281 died, exit code: -11

is there any solution here without using ray?

Long Sequence Length Inference Mamba2: CUDA error: an illegal memory access was encountered

fixed by changing indexing to int64 in kernels

Add num_splits support for FA3 backend

> Could you please follow the instructions [here](https://github.com/NVIDIA/TransformerEngine/pull/2357/checks?check_run_id=55026790643) to fix the DCO? Thanks! I think this is done?

Native weight resharding for Megatron RL

/ok to test 2190b222535877c9b9be596b30c2dda27a3e6205

Native weight resharding for Megatron RL

/ok to test [9641c38](https://github.com/NVIDIA/Megatron-LM/pull/2379/commits/9641c38a6a2dde146109bf81ba33d03ede95383b)

Native weight resharding for Megatron RL

/ok to test [12dc7ae](https://github.com/NVIDIA/Megatron-LM/pull/2379/commits/12dc7ae19c7aa512a629046e3d3ae88055e5e5d0)

Native weight resharding for Megatron RL

/ok to test [70272da](https://github.com/NVIDIA/Megatron-LM/pull/2379/commits/70272da9719189d4af7dd9bd176b5ae0eca2e9d3)

Native weight resharding for Megatron RL

/ok to test [cc5b44b](https://github.com/NVIDIA/Megatron-LM/pull/2379/commits/cc5b44bebf2a496ffe206a5a3f408396574a707f)

Batch Invariance

/ok to test 7254f0f7c0e6baea28be93c3f2fd32b7c2b452a5