GPUExample icon indicating copy to clipboard operation
GPUExample copied to clipboard

GPUExample

GPUExample

GPGPU Example with Apple's Metal API

Project overview

KernelSelectionController.swift

Table View with available kernels to compare CPU and GPU performance

ViewController.swift

Performs CPU and GPU computations. Shows execution times.

kernel.metal

Kernels that are executed on GPU with Metal API.

map

Simple map that applies cosine function to each element of input array.

reduce1

Naive parallel reduction (computes sum of cosine of each input array element). reduce1

reduce2

Changed threads performing reduction. reduce2

reduce3

Accessing connected memory space. reduce3

reduce4

The same as in reduce3 but first reduction step is performed when copying data to shared memory, so we need half the number of threads that we needed in the previous reduce versions.

NOTICE

  • Graphics presenting reduction optimization steps source: Optimizing Parallel Reduction in CUDA by Mark Harris https://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf

  • This example only works for input array which size is a positive integer power of 2. As a simple exercise, you can try to make it more flexible.