Benchmark is easy to misuse
The main parameter beside resolution is the steps parameter. This indicates (for me at least) that as you make more steps the benchmark is more meaningful.
That is somehow true. But this also highly depends on the resolution. E.g.:
- You benchmark with a resolution of 4048 and 1000 steps. The result of the benchmark is 400 Mlups.
Basically you told the benchmark to calculate
4,048 Nodes * 4,048 Nodes * 1,000 Steps = 16,386,304,000 NodesThe benchmark took16,386,304,000 Nodes / 400,000,000 lups = ~41 Seconds
- You benchmark with a resolution of 16 and 1000 steps. The result of the benchmark is 350 Mlups.
Basically you told the benchmark to calculate
16 Nodes * 16 Nodes * 1,000 Steps = 256,000 NodesThe benchmark took256,000 Nodes / 350,000,000 lups = ~0.0007314 Seconds
There one big issue in comparing these two benchmarks:
- The time the second benchmark took is so short that you would say it is not accurate at all. There is just too much noise in the performance of a GPU for such a short benchmark to be accurate
- On the other hand the first benchmark seems much more accurate.
What I described is not a Bug but it leads the user to misuse the api.
I would consider to either change the benchmark parameter to Seconds or to TotalNodes (to process).
A Seconds parameter would make the benchmark also more useful for automation as it is somehow uncomfortable to run a process that takes an unclear amount of time.