chipStar
chipStar copied to clipboard
myocyte benchmark from HeCBench significantly slower with chipStar than with SYCL [LevelZero backend]
Without immediate queues, chipStar is ~100x slower, with immediate queues it is ~10x slower. My initial examination seems to point to many (possibly unnecessary) barrier commands, but anyway this needs to be investigated.
You should run it through iprof to see if the kernels themselves aren't extra slow. Are atomics used?
On Fri, Aug 25, 2023 at 17:33 Michal Babej @.***> wrote:
Without immediate queues, chipStar is ~100x slower, with immediate queues it is ~10x slower. My initial examination seems to point to many (possibly unnecessary) barrier commands, but anyway this needs to be investigated.
— Reply to this email directly, view it on GitHub https://github.com/CHIP-SPV/chipStar/issues/599, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCJBQLY7HDCDJZTTYBN453XXCZTXANCNFSM6AAAAAA36TFTGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
There's something fishy with the immediate command lists and the barriers. GAMESS fails with a lot of these errors flooded to log when I enable ICL:
CHIP error [TID 32150] [1692974685.965693710] : hipErrorTbd (ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY ) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:1351:enqueueBarrierImpl
CHIP error [TID 32150] [1692974685.975379722] : Caught Error: hipErrorTbd
CHIP error [TID 32150] [1692974685.977357058] : hipErrorTbd (ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY ) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:1351:enqueueBarrierImpl
This could be related to the reported issue here (suspected excessive barrier usage).
What GPU is being used?
On Fri, Aug 25, 2023 at 17:50 Pekka Jääskeläinen @.***> wrote:
There's something fishy with the immediate command lists and the barriers. GAMESS fails with a lot of these errors flooded to log when I enable ICL:
CHIP error [TID 32150] [1692974685.965693710] : hipErrorTbd (ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY ) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:1351:enqueueBarrierImpl
CHIP error [TID 32150] [1692974685.975379722] : Caught Error: hipErrorTbd CHIP error [TID 32150] [1692974685.977357058] : hipErrorTbd (ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY ) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:1351:enqueueBarrierImpl
This could be related to the reported issue here (suspected excessive barrier usage).
— Reply to this email directly, view it on GitHub https://github.com/CHIP-SPV/chipStar/issues/599#issuecomment-1693490517, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCJBQIJNY73YDTOGCUIQN3XXC3RXANCNFSM6AAAAAA36TFTGQ . You are receiving this because you commented.Message ID: @.***>
In my case, the iGPU, in Michal's a PVC.
I opened a separate issue (#612) of the still occuring problem of mine above.
Removing the barriers (+using event dependencies) significantly reduced the difference (to ~4x slower), but there was also a kernel problem - SYCL was using fast-math by default, and the kernels call pow/exp a lot, so SYCL was using native_pow / native_exp. Recompiling the SYCL without fast-math brought the difference down to 1.3x-1.4x.
30-40% is still significant. Any clue what drags chipStar down still?
@pjaaskel no, not yet.
@franz
Do you mean the SYCL compiler enables fast math by default ? I checked Makefile and it does not have the fast math flag.
Do you mean the SYCL compiler enables fast math by default ? I checked Makefile and it does not have the fast math flag.
This depends on the compiler. Intel compiler icpx sets fast math flag on (and also sets optimization level to -O2) by default while while GCC and Clang does not.