VkFFT icon indicating copy to clipboard operation
VkFFT copied to clipboard

Benchmark sources?

Open michaellarabel opened this issue 5 years ago • 6 comments

Hi, interesting project. I see on the main page you show off some vkFFT benchmarks (in relation to cuFFT). Just wondering if you have plans to post your benchmark code used for running the vkFFT tests? From quickly looking at Vulkan_FFT.cpp, it didn't seem to be part of the file. Just wondering for evaluating possibly using vkFFT in GPU/driver combination testing.

Thanks, Michael

michaellarabel avatar Aug 02 '20 20:08 michaellarabel

Hi, at this point the benchmark was done by manually launching every system size both in simple CUDA code and in Vulkan_FFT.cpp one. In Vulkan_FFT.cpp sample 0 corresponds to the described on main page R2C/C2R configuration. FFT was performed ~10-1000 times and then averaged. The process was then launched a couple more times to double check. I plan on releasing an automatic script for benchmarking in the next few days.

Best regards, Dmitrii

DTolm avatar Aug 03 '20 03:08 DTolm

I have added benchmark scripts to the project. To launch VkFFT benchmark, select sample 0 from Vulkan_FFT.cpp and for cuFFT there is a benchmark_cuFFT.cu script. They both print results to terminal.

Best regards, Tolmachev Dmitrii

DTolm avatar Aug 03 '20 17:08 DTolm

Hi, I would like to reopen this issue as the code has advanced pretty far since the initial commit. In the last update I made some big improvements in the way how benchmarking of VkFFT/CUDA works, so there is now a big sample set for small/medium and large systems, a gnuplot script that compares results to the cuFFT benchmark (also uploaded) and two parameters that can be changed between different GPUs - the amount of memory they transfer to the chip per one request and the size of the register file. Results are fairly consistent between the launches of the benchmark utility, so the difference between cuFFT and VkFFT can be clearly seen. It may be really interesting to see how the code behaves on different systems as this will help in refining it even more. Best regards, Tolmachev Dmitrii

DTolm avatar Sep 29 '20 18:09 DTolm

Thanks. The latest benchmark code seems to be working out reliably. I added it to the Phoronix Test Suite and began running benchmarks of VkFFT on a variety of cards.

Results so far can be found @ https://openbenchmarking.org/test/pts/vkfft while still testing other hardware configuration so more data will continue to accumulate over the days ahead.

michaellarabel avatar Oct 01 '20 20:10 michaellarabel

Big thanks for this, it is actually an extremely useful thing for the future. There are still som things that I need to clarify about the code, that can alter benchmark results between GPUs and are related to their architecture. The main one is the amount of coalesced memory, or how much data is transferred per one request from VRAM to the chip. It changes from GPU to GPU and the values I know about are: 1.Nvidia after (including) Pascal: 32 bytes 2.Nvidia before Pascal: 128 bytes 3.Intel: 64 bytes 4.AMD should also be 64 bytes, but it might be 32 for RDNA These values correspond to coalescedMemory parameter, which can be modified. There is also a feature that can make GPUs with big register file slightly faster, but I think it can be disabled for benchmark with little affect on the score. Also, I think bigger amount of systems will be better for averaging and final result and I should switch the benchmark to the C2C case from R2C. I will update the benchmark script now, so it has ~50 systems set used for precision calculation.

DTolm avatar Oct 01 '20 20:10 DTolm

I have made some stability fixes to the benchmark and increased the sample pool for better averaging and usecase coverage. I have also added beta version of vendor-specific settings configuration, so the benchmark should produce optimal results for most GPUs. If there is anything else you would like to change in the benchmark, feel free to ask!

DTolm avatar Oct 03 '20 10:10 DTolm