GPUExample
GPUExample copied to clipboard
GPUExample
GPUExample
GPGPU Example with Apple's Metal API
Project overview
KernelSelectionController.swift
Table View with available kernels to compare CPU and GPU performance
ViewController.swift
Performs CPU and GPU computations. Shows execution times.
kernel.metal
Kernels that are executed on GPU with Metal API.
map
Simple map that applies cosine function to each element of input array.
reduce1
Naive parallel reduction (computes sum of cosine of each input array element).
reduce2
Changed threads performing reduction.
reduce3
Accessing connected memory space.
reduce4
The same as in reduce3 but first reduction step is performed when copying data to shared memory, so we need half the number of threads that we needed in the previous reduce versions.
NOTICE
-
Graphics presenting reduction optimization steps source: Optimizing Parallel Reduction in CUDA by Mark Harris https://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf
-
This example only works for input array which size is a positive integer power of 2. As a simple exercise, you can try to make it more flexible.