dfdx
dfdx copied to clipboard
CUDA kernels JIT vs compile time compilation
Should CUDA kernels be JIT compiled at runtime or somehow compiled when the program is built? Best case we can support both of these easily via a feature flag or something else. JIT would be nice for quicker builds, but pays cost at runtime.
Related to #9
Resource I'll post here from pytorch land: https://dev-discuss.pytorch.org/t/keeping-pytorchs-ops-maintainable-the-jiterator/468
I think this can be done with two separate devices: CudaJIT
and Cuda
. They can share the underlyilng kernel code, but their impls can construct them differently
rust-cuda currently does not include a binding to nvrtc, so this will have to be added