ocannl
ocannl copied to clipboard
Consider re-introducing cuda `__constant__` arrays, but beware
I'll remove __constant__ for now to simplify code. It requires the constant arrays to be copied for each module.
Beware -- __constant__ is not necessarily faster than global memory when different threads access different locations: CUDA Constant Memory Best Practices. So __constant__ might only start making sense for elaborate optimizations, after/related to tiling.