Anton Gorenko
Anton Gorenko
> I am running the docker using this command > > ``` > docker run --rm -it --device=/dev/kfd --device=/dev/dri -v /home/aby/aMD_benchmark:/app__/ --group-add video my-rocm bash > ``` Then `--security-opt seccomp=unconfined`...
This difference in performance is really strange. Usually it's the other way around: HIP is faster than OpenCL. Can you run standard benchmarks and share the results? ``` cd examples...
Thanks a lot! So all benchmarks with PME are very slow for unknown reasons. Older RDNA GPUs are much faster (about 3x in amber20 cases). This needs investigation, I have...
Preliminary results of my investigation: 1. Indeed, this happens in PME kernels, `greadSpreadCharge` in particular. 2. This kernel uses float32 atomic add on GPUs supporting it or int64 atomic add...
I returned to the investigation. It turned out some test hangs that as I thought were related to RDNA4, also happen on other GPUs. See https://github.com/openmm/openmm/pull/4959 My micro-benchmarks shows that...
I have run all benchmarks with fixed point charge spreading and the workaround for 10 minutes each (`python3 benchmark.py --verbose --style=table --platform=HIP --precision=single --seconds=600`) without any issues. The performance is...
I think there is no harm to include it: the workaround affects only RDNA4, it fixes both performance and correctness (all tests pass on RDNA4). And even I or someone...
Hi all! I wanted to inform you that we are working on WMMA support for FMHA: #2528 There are still some things to do (see a list in the PR)...
I confirm that I want to be a maintainer.
I just wanted to note that you can build with the defualt gcc (no need to pass `-DCMAKE_CXX_COMPILER=hipcc` and set `DEVICE_LIB_PATH`), this means that clangdev and lld (and, likely, compiler-rt)...