volk
volk copied to clipboard
test timings of ORC kernels include ORC compilation time
It seems for the tests the timing data for ORC kernels includes the compilation time. For me it's often greater than 1.0s in these tests, whereas it's less than 200ms on a second instantiation.
The test outputs its best choice kernel at the end, but never includes ORC because of the long time at the start.
Interesting. Seems like volk_profile should be able to go through and compile each ORC kernel first, then use it for timing purposes. Are compiled ORC kernels cached somewhere? Can we store them like the FFTW wisdom? That would be awesome; I don't know enough about ORC to know how it does its thing. Please advise.
I assume our profiling should just make one or several calls to the benchmarked function before the actual measurements. All profiling can benefit from that.