ROC_SHMEM
ROC_SHMEM copied to clipboard
ROC_SHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
* imports dlmalloc (latest version MIT licensed) * create an encapsulation class DLMalloc that exposes only relevant functionalities, to prevent using non-static/templated members of parent class we use the mspace...
- Added `--map-by numa` flag - Added `--timeout` flag - Added environment variable to enable/disable get tests
Tyical output looks like ```salloc -whostname -N1 -n2 --gpus-per-task=1 -c6 ../rocSHMEM/scripts/functional_tests/driver.sh tests/functional_tests/rocshmem_example_driver rma logs mpirun -n 2 -mca pml ucx -x ROCSHMEM_MAX_NUM_CONTEXTS=1 tests/functional_tests/rocshmem_example_driver -a 2 -w 1 -z 1 -s...
This bug fix is in develop (#31) but it has not been incorporated into ROCm 6.4.x
## Motivation Add experimental support for gfx1201 architectures (Radeon RX 9070 and 9070XT, Radeon AI PRO 9700) ## Technical Details Initial experiments with gfx12 based ISAs ## Test Result After...
## Motivation 1. Remove unused code 2. Enable removing switch from critical path in memcpy_lane/wave/wg? ## Technical Details Remove LOAD-STORE macros and replace usage of STORE macro with its definition...
## Motivation Increase the maximum message rate by using all enabled threads in the wave for polling completions. ## Technical Details Use all available threads for polling the cq to...
### Suggestion Description ## Description Since overal strategy of ROCM library in past, developers need to have a closer look at NVIDIA nvshem implementation (shareing the software development fundations). Currently...
## Motivation Remade cherrypicks after accidental force push
## Motivation Users want a tool compare performance between version X and version Y of our code. ## Design A python matplotlib script that can be used to compare the...