warp Support of shared memory

Can you please explain how to support shared memory in a kernel ? Does the warp compiler optimize a kernel with shared memory ? Thanks.

Jul 16 '24 23:07 jinz2014

For example, a dot product of two arrays.

I suppose that the warp.dot() function computes a dot product of two vectors. Each vector is an element of an array.

Jul 16 '24 23:07 jinz2014

Hi @jinz2014 . Warp doesn't support shared memory in kernels directly, but you are free to use shared memory in native function snippets: https://nvidia.github.io/warp/modules/differentiability.html#custom-native-functions

Jul 17 '24 14:07 daedalus5

Hi @daedalus5 I see. Will developers need to compute local ID (i.e. threadIdx.x) in a thread block ? I think wp.tid() means global ID.

Jul 17 '24 14:07 jinz2014

Are there functions for local ID, thread block size, thread block ID ?

Jul 17 '24 14:07 jinz2014

Yes, wp.tid() is a global ID. We don't have functions in Python for those, but you should be able to access eg threadIdx.x in a native snippet as you would normally.

Jul 17 '24 15:07 daedalus5

Does snippet support template type ?

snippet = 
'''
  __shared__ T sum[256]
'''

Jul 18 '24 14:07 jinz2014

No, I don't think templates would work in snippets currently.

Jul 18 '24 19:07 daedalus5