lesson-gpu-programming
lesson-gpu-programming copied to clipboard
Increasing work per thread/block
We spend (and I believe rightly so) some time to expand our vector_add
example into code that can be run on vectors of arbitrary size.
But what if the vectors are so large that having one thread per element is not enough?
We need to introduce the concepts of: 1) increasing the amount of work per thread, and 2) increasing the amount of work per block.