Rahulkumar Gayatri
Rahulkumar Gayatri
The PR adds an option to use `cudaFuncCachePreferEqual` cache config option instead of always choosing `cudaFuncCachePreferShared`.
The PR enables tiling for MDRange policies in the OpenMPTarget backend. Previously tiling for MDRange policy was disabled due to a bug in the older llvm compilers.
The PR is to address issue #6508. The PR adds the following: * Macros to mark branches with `likely` and `unlikely` attribute from C++20 for reference counting of views. *...
I am trying to build mpich/4.2.0 with clang as the base compilers for C/C++ and CUDA. The machine is Perlmutter. Versions used: cuda/12.2 clang/17.0.6 However, I got the same issue...
The PR uses `atomic` operation from OpenMP spec instead of DESUL atomics for llvm/18 as they cause compiler segfaults.
### Problem Description I am trying to build hipfft/rocm-5.5.1 on NVIDIA A100 GPUs available on the Perlmutter supercomputer. I already have cuda/12.2 and the corresponding cuFFT in my path. There...
List of features that are still missing in the OpenMPTarget backend : 1. User defined reduction via init/join 2. UniqueTokenScope::Instance 3. AtomicViews 4. Big atomics - Atomics over non-native data...
Draft: The PR attempts to fix the random errors of "No GPU found" by updating the docker with options that build the NVIDIA target arch.
### Problem Description I was following the commands to install hip using the instructions provided [here](https://rocm.docs.amd.com/projects/HIP/en/latest/install/install.html#installation) I get the following issue ```bash cmake -DHIP_COMMON_DIR=/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip -DHIP_PLATFORM=nvidia -DCMAKE_INSTALL_PREFIX=/global/cfs/cdirs/nstaff/rgayatri/software/hip/clr/build/build/install -DHIP_CATCH_TEST=0 -DCLR_BUILD_HIP=ON -DCLR_BUILD_OCL=OFF -DHIPNV_DIR=/global/cfs/cdirs/nstaff/rgayatri/software/hip/hipother/hipnv...
Co-authored-by: Dong Hun Lee @ldh4 The PR tries to work around the issue with CrayClang compiler where we need to do a bitwise comparison to evaluate the presence of `__AVX2__`...