Syed Tousif Ahmed

Results 26 comments of Syed Tousif Ahmed

Hi @dhananjays, I don't have a workaround for getting the exact signals that `ap_hs` used to infer. We ended up changing our interfaces to `axis`. So that would mean you...

Thanks @kwen2501 ! Addressed all the review. Will merge after CI is green and posting a local run of the test in a multicast supported machine.

Ran test in 8xH100 system with nvswitch (supports multicast): `python test/distributed/test_c10d_nccl.py -k test_nccl_user_buffer_registration -v` ``` INFO:numba.cuda.cudadrv.driver:init Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py312_cu126/nccl_allocator/build.ninja... Building extension module nccl_allocator......

> This meeans we end up shipping 3 more .so s right? I wonder if ther isn't a better way to do this by including the files in the extension...

Closing, has been merged in https://github.com/pytorch/ao/pull/2278.

@IvanYashchuk You might wanna checkout selective activation checkpointing available in PyTorch nightlies: https://pytorch.org/docs/main/checkpoint.html#torch.utils.checkpoint.create_selective_checkpoint_contexts to specify which activations to save for backward.