Nilesh M Negi
Nilesh M Negi
Unable to download data for minigo (https://github.com/mlperf/training/blob/master/reinforcement/tensorflow/minigo/ml_perf/get_data.py). Are there any alternate links? ``` Running: gsutil -m cp -r gs://minigo-pub/ml_perf/checkpoint/9 ml_perf/checkpoint Traceback (most recent call last): File "ml_perf/get_data.py", line 73, in...
https://github.com/mlperf/training_results_v0.6/tree/master/NVIDIA/benchmarks/resnet/implementations/mxnet/README.md The requirements lists MXNet 18.11-py3 NGC container whereas the Docker file uses MXNet 19.05-py3 NGC container
Trying to use the end-of-file *RESULT* statements in logs on [training_results_v0.7/NVIDIA/results/dgxa100_ngc20.06_pytorch/gnmt/](https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/results/dgxa100_ngc20.06_pytorch/gnmt) and [training_results_v0.7/NVIDIA/results/dgxa100_ngc20.06_pytorch/transformer/](https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/results/dgxa100_ngc20.06_pytorch/transformer). For gnmt: ``` $ for i in `ls NVIDIA/results/dgxa100_ngc20.06_pytorch/gnmt/result_*` ; do grep -m1 "^RESULT" $i ; done...
The wikidumps download link referenced in BERT's [README.md](https://github.com/mlperf/training_results_v0.7/tree/master/NVIDIA/benchmarks/bert/implementations/pytorch) does not exist anymore. Is there an alternative one?
## Details **Work item:** Internal **What were the changes?** Enable the use of `amdclang++` instead of `hipcc` for building RCCL. **Why were the changes made?** - Update `CXX` and `C`...
- Added Dockerfile - Updated README.md with instructions for using Dockerfile
## Details **Work item:** Internal **What were the changes?** - Modify RCCL build for Address Sanitizer (ASAN)-enabled builds to only target GPU architectures with `xnack+`. - Remove older GPU architectures...
## Details **Work item:** Internal **What were the changes?** Support custom `CMAKE_PREFIX_PATH` when building MSCCLPP **Why were the changes made?** `CMAKE_PREFIX_PATH` specified for RCCL build was not being passed to...
## Details **Work item:** Internal **What were the changes?** Update README on using RCCL with less than 8 MI300 GPUs and how to improve performance ## Approval Checklist ___Do not...
## Details **Work item:** Internal **What were the changes?** Update RCCL CHANGELOG for ROCm 6.2.x