compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

Battlemage perf in Phoronix benchmarks

Open eero-t opened this issue 11 months ago • 5 comments

Phoronix compared B580 performance to other GPUs using Git version of Linux 6.13 kernel, and compute-runtime v24.45.31740.9 + IGC v2.1.12.

While B580 perf was already better than A580 perf in most compute tests (as expected), there were also couple where it was not yet:

  • https://www.phoronix.com/review/intel-arc-b580-gpu-compute/4
  • https://www.phoronix.com/review/intel-arc-b580-gpu-compute/2

Clpeak latency test (at start of page 4) had largest gap, which could be also Xe kernel driver issue.

eero-t avatar Dec 18 '24 17:12 eero-t

Thanks for issue, we are working on improving submission latency.

If you are eager to build the driver by yourself you may try with below commit (It enables direct submission on BMG with XE_KMD): https://github.com/intel/compute-runtime/commit/fa2ff678fad3807846e050568e34717acdd2eee0

Data from our labs show significant improvements in this test case.

MichalMrozek avatar Dec 18 '24 17:12 MichalMrozek

This is more of a reminder to include note of the related improvements to releases including them.

Michael tracks this project release notes, so he may do another Phoronix test round / article when seeing mention of a fix for an issue highlighted in his earlier tests (which is good marketing for both :-)). I think his test suite differs from one used by compute-runtime team, so it gives extra information on what other things improved, and what still remains (or regressed).

Btw. Does direct submission need some specific kernel / Xe KMD version, or is it enough to have a kernel version where BMG does not need force-probing?

eero-t avatar Dec 18 '24 18:12 eero-t

Direct submission relies mostly on VM_BIND which is basic functionality in XE_KMD. So technically any XE KMD that recognizes BMG out of the box would suffice, however since it is first platform that would enable Direct Submission, there may be some hiccups. In releases we indicate with what kernel given release was validated.

MichalMrozek avatar Dec 19 '24 07:12 MichalMrozek

I am proud to share that our upstream software stack now supports Direct Submission on BMG. More information about performance here -> https://www.phoronix.com/review/intel-b580-opencl-january

MichalMrozek avatar Jan 14 '25 02:01 MichalMrozek

I am proud to share that our upstream software stack now supports Direct Submission on BMG. More information about performance here -> https://www.phoronix.com/review/intel-b580-opencl-january

Wow, that's great improvement, from worst latency, to best one!

And there are very noticeable perf improvements also in many other cases...

However, page 3 of that article shows couple of benchmark subtests where Battlemage perf is still at the bottom place:

  • Darktable 4.8.1: Server Rack (OpenCL)
  • Hashcat 6.2.4: SHA-512

And page 2 show some OpenCL subtests where BMG perf is still below Arc Perf:

  • SHOC 2020-04-17: MD5 Hash
  • SHOC 2020-04-17: FFT SP

Are the issues with those already known?


PS. Michael's earlier article showed worse than Arc perf also in some additional tests that were not included to this latest round:

  • Blender 4.3: Classroom & Junkshop (oneAPI)
  • FluidX3D 3.0: FP32-FP16C
    • perf in other FluidX3D subtests was good though

Any idea whether perf in those has improved?

eero-t avatar Jan 14 '25 09:01 eero-t