bdenhollander

Results 41 comments of bdenhollander

Retested with Windows [HIP SDK 6.1.2](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html) and the compiler issue is resolved. `#pragma unroll 1` is no longer required. ``` C:\AMD\ROCm\6.1\bin>hipcc.bin.exe --version HIP version: 6.1.40252-53f3e11ac clang version 19.0.0git ([email protected]:Compute-Mirrors/llvm-project b3dbdf4f03718d63a3292f784216fddb3e73d521)...

Are you able to run the benchmarks included with OpenMM? Based on the path in the error message, the benchmark script would probably be in `/ccs/proj/bip109/frontier/conda/miniconda3_py310_23.3.1-0_2023_05_10/envs/openmm_8.1.0/Library/share/openmm/examples/benchmark.py`. Checking this would help...

@ex-rzr Are you be able to test this change on your fleet of AMD GPUs?

Does `context.getDevice().getInfo()` report 20 (WGPs) or 40 (CUs) for the RX 5700 XT in Linux? If it's 40 then the multiplier may need to be 8 on Linux instead of...

> Won't that reduce the threads more, not less? Whoops, wrong operator. Fixed in the most recent edit.

I attempted to reproduce the gbsa performance regression on my RX 6600 on Ubuntu 20.04.06 but I'm seeing 1-3% rather than 10-20%. Below are results for v1 of the thread...

RX 6600 rerun on Windows 10. | | Original | 16x Reduce Threads v1 | Ratio | 16x Reduce Threads v2 | Ratio | 16x No Reduce Threads | Ratio...

Updated benchmarks for OpenMM 8.1 on RX 6000 OpenCL 3592.0 on Windows. | Benchmark | OpenMM 8.1ns/day | Tunedns/day | Ratio | |-------------------|-------------------|--------------|-------| | gbsa | 655.36 | 655.675 |...

@peastman Is there any chance you could benchmark this branch again on your wider 5700 XT?

I commented out the `enqueueReadBuffer` and the code that relied on it. Performance is through the roof on gbsa, 410 ns/day -> 3030 ns/day! ```C++ void OpenCLNonbondedUtilities::computeInteractions(int forceGroups, bool includeForces,...