chipStar
chipStar copied to clipboard
myocyte benchmark from HeCBench significantly slower with chipStar than with SYCL [LevelZero backend]
Without immediate queues, chipStar is ~100x slower, with immediate queues it is ~10x slower. My initial examination seems to point to many (possibly unnecessary) barrier commands, but anyway this needs to be investigated.