ocannl icon indicating copy to clipboard operation
ocannl copied to clipboard

Consider re-introducing cuda `__constant__` arrays, but beware

Open lukstafi opened this issue 2 years ago • 0 comments

I'll remove __constant__ for now to simplify code. It requires the constant arrays to be copied for each module.

Beware -- __constant__ is not necessarily faster than global memory when different threads access different locations: CUDA Constant Memory Best Practices. So __constant__ might only start making sense for elaborate optimizations, after/related to tiling.

lukstafi avatar Aug 30 '23 20:08 lukstafi