openmm icon indicating copy to clipboard operation
openmm copied to clipboard

One loop hangs Amoeba on Windows HIP

Open bdenhollander opened this issue 1 year ago • 5 comments

After hundreds of display driver resets, I finally tracked down what hangs in solveDIISMatrix when iteration > 0 on Windows HIP.

https://github.com/openmm/openmm/blob/065e34ab56259d56dae81655f14fcfcbdb077788/plugins/amoeba/platforms/common/src/kernels/multipoleInducedField.cc#L765-L769

The issue is resolved by adding #pragma unroll 1. This hang only happens on Windows HIP. Ubuntu 20.04 compiles kernels using Clang 15.x and Windows HIP bundles Clang 17.0.0 and could affect how the kernel is optimized by /O3. Since this is OS dependent and the code is in common, I'm not sure what the best way to implement the fix would be.

RX 6600 on Windows HIP. The massive amoebagk improvement is inline with previously reported Linux HIP results.

Benchmark OpenCL ns/day HIP ns/day Ratio
amoebagk 1.38817 15.4375 1112.08%
amoebapme 3.61985 6.11883 169.04%

bdenhollander avatar Aug 18 '23 12:08 bdenhollander

That sounds like a compiler bug. It's a really simple loop, and unrolling should never affect correctness. We should probably notify AMD about it.

@ex-rzr do you have any insight about this?

peastman avatar Aug 18 '23 17:08 peastman

I spent some time trying to build and run OpenMM-HIP on Windows with an older pre-release HIP SDK version a few months ago. I didn't finish my attempt then because it was my personal initiative (OpenMM-HIP is supported by AMD developers currently) and needed to switch to another project. And I definitely remember some issues with AMOEBA but I'm not sure whether they were in tests or benchmarks and what kind of issues (hangs or just failures).

@bdenhollander I see that you've done a lot of work on supporting Windows: https://github.com/bdenhollander/openmm-hip/commits/windows-compatibility I'll try to find time next week and build and run your branch on our GPU. Unfortunately it's also RDNA. If it's indeed a compiler bug then trying it on another architecture may provide some additional info.

Good job with narrowing it down, btw! Compiler issues are incredibly hard to debug because they often disappear with a minor change in code like adding printf or commenting something.

Further steps to understand what happens may be running with OPENMM_SAVE_TEMPS=pathwheresavetempsto on Linux and Windows and comparing corresponding .s files. If you are going to do this, please upload them, I'm also curious.

As a workaround I have a crazy idea to "patch" a source code before compiling in HipContext::createModule: find for (k = 0; k < rank; k++) { in a string and add #pragma unroll 1\n before it. It sounds a bit stupid but it does not require modifying main sources and hence depending on a very recent version of OpenMM.

@jdmaia Have you heard about similar issues on HIP SDK for Windows?

ex-rzr avatar Aug 18 '23 18:08 ex-rzr

Deja vu. This swapping code reminds me one compiler bug. We had an issue with one test in rocPRIM: https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/rocprim/include/rocprim/warp/detail/warp_sort_stable.hpp#L105

As you can see, the code is very similar. So I found the exact place in a compiled code where the compiler missed one instructions (or more precisely: incorrectly removed one instruction). Then AMD's compiler developer fixed this bug: https://reviews.llvm.org/rGdf1782c2a2af9938ba4c5bacfab20d1ddebc82dd

I wonder if this is indeed the same compiler bug and if the compiler from HIP SDK for Windows includes the fix. @bdenhollander Could you check clang's version (with a commit hash) with hipconfig?

ex-rzr avatar Aug 18 '23 19:08 ex-rzr

I generated .s files with and without #pragma unroll 1 on Windows. Diffing ignoring whitespace and numbers shows differences in lines 3561-4090.

unroll-windows-hip-amdgcn-amd-amdhsa-gfx1032.s.txt hang-windows-hip-amdgcn-amd-amdhsa-gfx1032.s.txt

hipcc.bin.exe prints clang version 17.0.0 ([email protected]:Compute-Mirrors/llvm-project e3201662d21c48894f2156d302276eb1cf47c7be) when it launches.

bdenhollander avatar Aug 18 '23 19:08 bdenhollander

Retested this with Windows HIP SDK 5.7.1 and #pragma unroll 1 is still needed. The clang hash reported by hipcc.bin.exe has changed since 5.5.1.

C:\AMD\ROCm\5.7\bin\hipcc.bin.exe --version
HIP version: 5.7.32000-193a0b56e
clang version 17.0.0 ([email protected]:Compute-Mirrors/llvm-project 6e709f613348e5258188527d11ee8d78376f26b7)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\AMD\ROCm\5.7\bin

bdenhollander avatar Jan 10 '24 01:01 bdenhollander

Retested with Windows HIP SDK 6.1.2 and the compiler issue is resolved. #pragma unroll 1 is no longer required.

C:\AMD\ROCm\6.1\bin>hipcc.bin.exe --version
HIP version: 6.1.40252-53f3e11ac
clang version 19.0.0git ([email protected]:Compute-Mirrors/llvm-project b3dbdf4f03718d63a3292f784216fddb3e73d521)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\AMD\ROCm\6.1\bin

bdenhollander avatar Jul 17 '24 01:07 bdenhollander