gdrcopy
gdrcopy copied to clipboard
add autotuning support
optimized memcpy implementations should be chosen at run-time during a tuning phase, possibly in gdr_open()
Curious: what dimensions are you going to be tuning over here in the autotuner?
@maddyscientist that is a good question. I am not expecting a dependency on the buffer size, but I might be wrong.
@drossetti BTW, is there any calculation formula, otherwise that would depend on experimental values on kinds of HW configuration
@hongbilu any performance model would be HW dependent inherently, so it would involve maintaining a database of FOMs for each platform. That is why I was proposing a run-time autotuning phase instead.
@drossetti that would be a big work and cpu's work frequency or workload also need to be considered in theory. Experiments show that cpu's work frequency is a key influence factor