Scientific-Programming-in-Julia icon indicating copy to clipboard operation
Scientific-Programming-in-Julia copied to clipboard

Ideas for improving lecture/lab 11

Open janfrancu opened this issue 3 years ago • 0 comments

  • Add more references to included images.

  • Small note on NVidia/AMD framework comparison

    • SIMT/SIMD comparison (AMD calls multiple CUDA cores a SIMD unit)
    • warp is 32 threads whereas the equivalent wavefront in AMD has 64
    • on AMDGPU the requirement for warp alignment can be broken, however we cannot switch context that easily as a result
  • there is a nice picture which shows the division of work over a 1D/2D array into blocks/threads etc - https://youtu.be/LG9G4aA28rU?t=368 - some analog should be available somewhere or created manually

    • x property suggest that you can partition the execution along three-dimensional cube (three nested for loops), - should be explained better add pictures
      • such as https://developer-blogs.nvidia.com/wp-content/uploads/2017/01/cuda_indexing-1024x463.png in https://developer.nvidia.com/blog/even-easier-introduction-cuda/)
      • or http://2.bp.blogspot.com/-gSMyMA7nnIU/UPbTSAgV_CI/AAAAAAAAAtE/2D942iHCg9Q/s1600/gpu2.png
  • using GPU without writing kernels should mention the scalar indexing problem, which we often encounter (comparing CPU and GPU array without moving them to the same place, show methods fallback to getindex! ops)

janfrancu avatar Jan 11 '22 08:01 janfrancu