jitify
jitify copied to clipboard
[FEATURE REQUEST] NVTX Ranges
CUDF plans to adopt JITIFY for more of our kernels and UDFs, and to be able to effectively recommend them to our customers for their workloads, we need to know JITIFY's behaviors and performance characteristics.
We need the following NVTX regions:
- JIT Compilation time ranges
- Memory or Disk Cache Load time ranges
- JIT cache hit rates
Additionally, we'd need:
- A way to disable caching, this is important for benchmarking as the benchmarks are run in multiple iterations
Thanks for the RFE, I like the idea. (Also great to hear you plan to use Jitify more extensively).
Is there a particular way you would suggest reporting cache hit rates via NVTX?
Btw caching can be disabled by passing zero for max_in_mem and max_files when constructing ProgramCache, or by calling program_cache.resize(0).
I've added NVTX integration in this commit: https://github.com/NVIDIA/jitify/commit/bf1c8c0531a9253d0a7c420fc5f35e90b79e4fad
It's in the https://github.com/NVIDIA/jitify/pull/131 branch, which I'm hoping to merge soon.
Is there a particular way you would suggest reporting cache hit rates via NVTX?
As long as the NVTX regions are added for the entire compilation and caching process, that would be enough in the meantime for performance investigation. As for the cache hit rates, we just need to be able to query the hit rates at runtime, which is already done in your commit.
Thanks for promptly looking into this!