ndarray icon indicating copy to clipboard operation
ndarray copied to clipboard

GPU Support via OpenCL

Open Pencilcaseman opened this issue 3 months ago • 2 comments

This is very much a work-in-progress, but I wanted to know how this approach fits with the rest of the codebase. I realise it's not the most "rusty" implementation, but I think it could be abstracted away quite nicely.

There is a fair amount of unnecessary/unclean code, but that can be cleaned up pretty quickly. The changes made here only allow for binary operations (+ - * /) on contiguous ArrayBase references (see the example below). With more work, it could be expanded to support almost everything the CPU "backend" supports.

To enable OpenCL support, you need to enable the opencl feature.

Here is an example:

// Unfortunately, OpenCL requires some initialisation before it can be used.
// There are currently no checks on this, but they can be easily added.
ndarray::configure();

// Note that the result of `move_to_device` is a `Result<_, OpenCLErrorCode>`, so errors
// can be handled correctly
let x = ndarray::Array2::<f32>::from_shape_fn((3, 4), |(r, c)| (c + r * 4) as f32)
    .move_to_device(ndarray::Device::OpenCL)
    .unwrap_or_else(|| panic!("Something went wrong"));

let y = ndarray::Array2::<f32>::from_shape_fn((3, 4), |(r, c)| 12.0 - (c + r * 4) as f32)
    .move_to_device(ndarray::Device::OpenCL)
    .unwrap_or_else(|| panic!("Something went wrong"));

// Only this form of binary operation is supported currently.
// i.e. reference <op> reference
// This operation takes place in a JIT-compiled kernel on the GPU
let z = &x + &y;

// You can only print something if it's in Host memory. This could be changed
// to automatically copy/move to host if necessary
println!(
    "Result:\n{:>2}",
    z.move_to_device(ndarray::Device::Host).unwrap()
);

// [[12, 12, 12, 12],
//  [12, 12, 12, 12],
//  [12, 12, 12, 12]]

A very similar approach can be taken to get CUDA support working. It might even be possible to merge them into a single GPU backend trait, for example, which would simplify the implementations quite a bit. It'd require a few substantial changes internally, though (I think. Maybe not?).

Anyway, let me know what you think!

Pencilcaseman avatar Mar 30 '24 18:03 Pencilcaseman