libCEED icon indicating copy to clipboard operation
libCEED copied to clipboard

Additional backends

Open tzanio opened this issue 7 years ago • 13 comments

  • [ ] Improve OCCA backend
  • [ ] Add MFEM backend — how to support backends that don’t support JIT and don’t run on the host?
  • [x] Add MAGMA backend?
  • [ ] Add OpenMP 4.5 backend?
  • [x] Add pure CUDA backend?
  • [x] Add HIP backend?

tzanio avatar Jan 03 '18 19:01 tzanio

With the announcement that OLCF Frontier will be AMD CPU/GPU, we should try to get it into our workflow. We can use HIP (an open source CUDA-like model that can compile to CUDA and ROCm) which can be almost automatically produced from CUDA (using hipify-clang) or OpenMP-5 offload as on-node programming models. Note that HIP does not currently support run-time compilation.

HIP nominally compiles to CUDA with negligible overhead, but the toolchain needs to be installed to do so.

jedbrown avatar May 07 '19 19:05 jedbrown

OCCA:HIP supports run-time compilation.

tcew avatar May 07 '19 19:05 tcew

Our OCCA backend is in serious need of a performance overhaul, so it would be great if we can also include OCCA:HIP.

jeremylt avatar May 07 '19 19:05 jeremylt

See: https://github.com/libocca/occa/blob/022b76829d43cbe20b719e6d5a54c9aff8fa178c/src/modes/hip/device.cpp#L230

tcew avatar May 07 '19 19:05 tcew

Yes, I don't think anything special needs to be done for /gpu/occa/hip versus /gpu/occa/cuda, though the OCCA backend needs attention. My comment on run-time compilation was with regard to @YohannDudouit's native CUDA implementation.

I'm also curious about observed differences in performance characteristics between the Radeon Instinct and V100.

jedbrown avatar May 07 '19 19:05 jedbrown

You should follow up with Noel Chalmers. I believe he has run libP experiments with the Radeon Instinct.

tcew avatar May 07 '19 19:05 tcew

Thanks. @noelchalmers, can you share any experiments?

jedbrown avatar May 07 '19 19:05 jedbrown

Hi everyone. I'll try and chip in what I know for some of the points in this thread:

  • In addition to hipify-clang, which ports existing CUDA code by actually looking at the code's semantics, there is also hipify-perl which is a simple script which can convert CUDA codes to HIP, and at least warn about sections it is unable to translate.

  • HIP does indeed support runtime compilation in the same way CUDA does. OCCA uses analogous API calls for its runtime compilation of CUDA and HIP. I know the documentation of what is/is not currently in the HIP API is a bit sparse at the moment. The HIP Porting Guide is a good resource for the moment.

  • As for V100 vs Radeon Instinct performance, in micro-benchmarking we've been seeing bandwidth numbers in the 800-900 GB/s range for the MI-60s and similar GFLOP numbers to the PCIe V100s.

  • I don't have any readily available performance numbers for any CEED-relevant benchmarking. My plan is to resurrect the bake-off problems in libp and do some performance analysis to get a better sense of what the Radeons can do compared to the V100s. Libp's kernels rely heavily on things like shared memory bandwidth and cache performance so it will be a good exercise in finding out how portable they are to Radeon.

noelchalmers avatar May 07 '19 20:05 noelchalmers

Thanks, @noelchalmers. On run-time compilation, I don't see anything about porting NVRTC to HIP.

Are there any public clouds with Radeon Instinct (for continuous integration, etc.).

jedbrown avatar May 07 '19 20:05 jedbrown

I just realized that you were referring to NVRTC when you mentioned runtime compilation.

No, HIP currently doesn't support any nvrtc* API calls. I'm not aware of any plans to add these features, but I will ask around. What HIP does support is loading compiled binaries using hipModuleLoad, which is analogous to cuModuleLoad, and finding/launching kernels from that binary.

I don't know of any public clouds I can point to using MI-25 or MI-60s yet. Maybe for some CI tests you could try compiling on some Vegas in a gpueater session? Not ideal, certainly.

noelchalmers avatar May 07 '19 20:05 noelchalmers

Thanks. It looks like GPU Eater doesn't support docker-machine or Kubernetes so CI integration would be custom and/or not autoscaling, but it's something, so thanks.

jedbrown avatar May 07 '19 20:05 jedbrown

Yet another C++ layer, this one providing single source for CPU, OpenCL, and HIP/CUDA. https://github.com/illuhad/hipSYCL

jedbrown avatar May 23 '19 16:05 jedbrown

While I still don't see it on the docs website, hiprtc was apparently merged a few months ago. https://github.com/ROCm-Developer-Tools/HIP/pull/1097 I thought we discussed this specifically at CEED3AM and @noelchalmers and Damon were not aware that it existed. Is it something we should be trying now, or is the lack of documentation indication that it's still in easter-egg mode?

jedbrown avatar Sep 18 '19 04:09 jedbrown

I'll close this open-ended issue. There is an improved occa backend coming in #1043. I think at this point we can make new issues for specific backend requests.

jedbrown avatar Sep 06 '22 03:09 jedbrown