Pruthvi Madugundu

Results 9 issues of Pruthvi Madugundu

This problem updates the the PR [#73040](https://github.com/pytorch/pytorch/pull/73040) The compilation error in pyTorch with ROCm is successful with these changes when `NDEBUG` is enabled. Solution: For HIP we keep `__device__ __assert_fail()`...

module: rocm
triaged
open source
cla signed
ciflow/trunk
ciflow/periodic

cc @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport

module: rocm
open source
ciflow/trunk
topic: not user facing

cc @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport

module: rocm
open source
ciflow/trunk
release notes: distributed (c10d)
ciflow/periodic

- Removes hard coding and helps in internal builds cc @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport @dllehr-amd @jataylo @hongxiayang

module: rocm
triaged
open source
topic: not user facing
ciflow/inductor
ciflow/rocm

- Build is currently enabled for only hip_basic(cuda_basic) - Sample build command for reference `cmake ../ -DCMAKE_C_FLAGS="-Werror -Wno-deprecated-declarations -D__HIP_PLATFORM_HCC__=1" -DCMAKE_CXX_FLAGS="-Werror -Wno-deprecated-declarations -D__HIP_PLATFORM_HCC__=1" -DTP_ENABLE_SHM=OFF -DTP_ENABLE_CMA=OFF -DTP_USE_ROCM=ON -DTP_ENABLE_HIP_XTH=OFF -DTP_ENABLE_HIP_IPC=OFF -DTP_ENABLE_HIP_GDR=OFF -DTP_ENABLE_IBV=OFF -DTP_BUILD_TESTING=ON`

cla signed

- Add Hipify as a git submodule - Trigger hipify from cmake build - TP_USE_ROCM controls the trigger, which will be set to ON when building on ROCm

cla signed

- Changes to control hipify of CUDA_VERSION to HIP_VERSION - use GLOO_USE_ROCM instead of __HIP_PLATFORM_HCC__ - Adding __HIP_PLATFORM_AMD__ since __HIP_PLATFORM_HCC__ is being deprecated.

CLA Signed

### 🐛 Describe the bug distributed/rpc/cuda/test_tensorpipe_agent | test_async_execution_nested_with_cuda_future | (__main__.TensorPipeTensorPipeAgentCudaRpcTest) distributed/rpc/cuda/test_tensorpipe_agent | test_async_execution_with_cuda_future | (__main__.TensorPipeTensorPipeAgentCudaRpcTest) distributed/rpc/cuda/test_tensorpipe_agent | test_basic_gloo_ckpt_always | (__main__.TensorPipePipeWithDDPTest) distributed/rpc/cuda/test_tensorpipe_agent | test_basic_gloo_ckpt_except_last | (__main__.TensorPipePipeWithDDPTest) distributed/rpc/cuda/test_tensorpipe_agent | test_basic_gloo_ckpt_never | (__main__.TensorPipePipeWithDDPTest)...

Unit Test Parity

### 🐛 Describe the bug test_batchnorm_cudnn_nhwc | (__main__.TestNN) -- | -- test_conv_backend_cudnn1d_has_bias_False_strided_False_contiguous_False_cuda | (__main__.TestNNDeviceTypeCUDA) test_conv_backend_cudnn1d_has_bias_False_strided_False_contiguous_True_cuda | (__main__.TestNNDeviceTypeCUDA) test_conv_backend_cudnn1d_has_bias_False_strided_True_contiguous_False_cuda | (__main__.TestNNDeviceTypeCUDA) test_conv_backend_cudnn1d_has_bias_False_strided_True_contiguous_True_cuda | (__main__.TestNNDeviceTypeCUDA) test_conv_backend_cudnn1d_has_bias_True_strided_False_contiguous_False_cuda | (__main__.TestNNDeviceTypeCUDA) test_conv_backend_cudnn1d_has_bias_True_strided_False_contiguous_True_cuda | (__main__.TestNNDeviceTypeCUDA) test_conv_backend_cudnn1d_has_bias_True_strided_True_contiguous_False_cuda...

Unit Test Parity