ponyc icon indicating copy to clipboard operation
ponyc copied to clipboard

Investigate compilation to GPUs

Open Praetonus opened this issue 8 years ago • 6 comments

LLVM has experimental support of NVPTX and AMDGPU backends. Getting Pony to run on these as a proof of concept would be great.

Praetonus avatar Oct 17 '16 01:10 Praetonus

Just want to comment saying that this would be awesome - I think the actor paradigm w/ capabilities maps really neatly onto gpus & the compile time types of memory (const, val, etc.).

ryanai3 avatar Oct 19 '16 01:10 ryanai3

I'd really like to do some work on this, but I don't think I'm particularly qualified. If any tips or pointers could be given on where to start, I'd be happy to give it a go.

Though I do wonder, given that most GPU based code (from what little research I've done) functions by using a host program to send processing kernels to the GPU, and then fetching the results later, how should this behaviour be written in Pony code? Or should each new actor (beyond Main, which can be the host) be forked as a new work item in the corresponding work group on the GPU? (e.g. a newly created OutStream actor becomes a new work item in the existing OutStream work group)

mpbagot avatar Jan 05 '19 16:01 mpbagot

Marked as "needs discussion during sync" to make sure that folks weigh in on @mpbagot's question.

SeanTAllen avatar Jan 05 '19 18:01 SeanTAllen

We discussed this on today's sync call. I didn't take notes on all of what was said, and I personally don't know a lot about it, but if you listen to the last five minutes of the call, you can hear @sylvanc's comments on it.

One place to start would be to introduce an annotation (\annotation\ syntax) at the function level to specify that this function is targeting the GPU, with the parameters going into a render target / texture buffer (or some other way of communicating it), and then "render" the return value to the output render target.

jemc avatar Jan 08 '19 18:01 jemc

It might be worth considering trying to target SPIR-V, either directly or through the LLVM<->SPIR-V converter ( https://github.com/KhronosGroup/SPIRV-LLVM ) because SPIR-V is an intermediate which can be used to represent OpenCL and that allows vendor drivers to take over and compile it down to native.

cjdelisle avatar Mar 18 '19 19:03 cjdelisle

There's quite a few issues I'm finding whilst trying to conceptualise how this could be done. Assuming only functions can be parallelised (for simplicity's sake), you have to consider the following:

  1. How should GPU functions be called syntactically? Consider a simple function like below (gpufunc is a stand-in for the function annotation). This function should be called with two equal length arrays of input, and each pair of (a,b) values would be processed in a new thread on the GPU. Syntactically, what would a function call look like?
/gpufunc/
fun func_a(a: U32, b: U32) : U32 =>
    a + b

Should it be the same as all function calls, with array/iterable inputs?

result: Array[U32] = func_a([1; 2; 3], [4; 5; 6])

Or should some other form of call be used, like with the @ prefix on FFI calls, to ensure a noticeable distinction of GPU function calls vs CPU function calls?

  1. GPU function calls from within GPU functions. I would assume that calls into functions would call that function with a single value of data. However, this means that the function parameter types need to change depending on where the function is called from, which is both somewhat confusing, and also complicates the syntactic processing. Alternate syntax for parallel calls would avoid this.

  2. CPU function calls from within GPU functions. Would CPU functions be compiled twice, for GPU and CPU? Or would GPU functions simply be restricted to only calling other GPU functions to sidestep this?

  3. FFI calls from GPU functions. These would need to be prevented, since there is no feasible way to compile arbitrary C libraries to run on the GPU.

  4. Race conditions across multiple threads. Consider a case where a GPU function modifies a value in the class object it is called from. In traditional CUDA, all threads would write to the location in an indeterminate order. In pony, I imagine this behaviour would violate various guarantees. It would be easiest to ensure only pure functions can be GPU functions, as this solves both the issues of race conditions, and of modifying CPU objects from the GPU

  5. Introduction of SPIRV-LLVM library to ponyc. As mentioned by cjdelisle, the simplest method for cross-platform GPU compilation would involve taking the functions' LLVM-IR code and translating it to SPIR-V using https://github.com/KhronosGroup/SPIRV-LLVM-Translator/. This would be the most user-friendly, and simplest implementation for this functionality that I can see, however it would involve pulling in an additional dependency for ponyc, for functionality that will only be used by a small number of pony programs.

mpbagot avatar Oct 29 '21 07:10 mpbagot