cufinufft Add support for AMD GPUs via HIP

This is work in progress branch to add support for AMD GPUs via HIP, a very nearly CUDA-compatible API by AMD. While this isn't ready to go yet, it's far enough along that I wanted to get feedback on the approach I'm taking.

The basic idea of HIP is to provide CUDA-compatible APIs that are identical except for naming. In most cases you can find-and-replace cuda for hip and things will work.

Rather than literally doing that transformation to the code (the way AMD seems to want you to do it), I'm using an approach with a header wrapper around cuda.h / hip/hip_runtime.h. Then I add some translations with #define cudaX hipX for each X API call used in the code. The advantage of going this route is that the vast majority of the code does not need to change at all, at least for a functionally correct version. I haven't looked at performance at all yet, and there may be more work to do there.

I also made a few minor changes to the build system, mostly to provide enough configuration flexibility so that site files are sufficient to build against HIP. I provide an example site in sites/make.inc.olcf_spock for Spock.

Status:

With this configuration, I'm able to build and run on Spock. I am testing against ROCm 4.5.0.

Running make site=olcf_spock check currently gives me failures in the following tests. All other tests are passing.

bin/cufinufft3d1_test 4 15 15 15 2048 1e-3
bin/cufinufft3d1_test 4 15 15 15
bin/cufinufft3d1_test_32 4 15 15 15 2048 1e-3
bin/cufinufft3d1_test_32 4 15 15 15

I had to remove pycuda from python/cufinufft/requirements.txt because it doesn't exist in HIP (PyCUDA has not been ported, and I'm not sure whether it will be). I'm not sure how to deal with this at the moment since I'm not sure requirements.txt has an optional dependency syntax.

I'd appreciate a review of the basic approach I'm using, and if anyone has advice on how to address the remaining tests failures, that would be helpful. Thanks!

P.S. This may also be sufficient to get Intel GPUs working via HIPCL, though I haven't tested that.

Nov 12 '21 19:11 elliottslaughter

cc @mari-sosa

Sep 19 '18 23:09 tjd2002

I think the developer installation instructions are technically correct as they stand: https://github.com/flatironinstitute/mountainsort_examples

Note the step: export PYTHONPATH=[fill-in-path]/ml_ms4alg:$PYTHONPATH

After doing that, ml-link-python-module should work, and does the same thing as manually creating the symbolic link.

However, I agree this is confusing. Should we change these docs to a ln -s ... command? Or are there other places where this should be clarified? @tjd2002

Sep 20 '18 13:09 magland