iree
iree copied to clipboard
Add an experimental HSA backend
This PR proposes adding an experimental Heterogeneous System Architecture (HSA) backend for IREE. HSA provides standard APIs to manage and manipulate low-level device(s) primitives such as queues, signals, and memory pools, and the proposed backend surfaces these primitives to the HAL layer. I marked the PR as a draft and would love your feedback. I will happily address any comments or discuss code changes in comments (or walk through code changes on Teams). Jose and I are the coauthors of this work from AMD's RAD team.
Additional notes:
- The backend implementation started as a copy of the HIP backend (commit hash 9e95c38fdf1274e17eef521edc8536b3f10f791b), which I reduced to the barebones requirements for dispatching packets.
- The backend implements a simple single-queue-based dispatching backend.
- HIP events are replaced with barrier packets with completion signals applying a user-defined function when the packet is reached.
- The implementation only uses a fine-grained memory pool to service the allocations.
Except for the issues below all other 106 unit tests are passing. I tested on gfx1103 and ROCm 6.2.0 but can/will test on other chips as well.
Known issues (at the moment):
- ROCr lacks some of the async memory-copy and fill APIs. Resolving these will require feature requests in ROCr or custom kernels in IREE.
- The deferred execution is currently failing (I believe this is because the binding tables are not correctly passed through -- a recent change that is not incorporated here).
- Some of the module runs tests are currently also failing (See the CMakeLists.txt files for tests)
- Semaphore tests
WaitThenFail
andMultiWaitThenFail
are failing at the moment.
Some possible future improvements:
- It is possible to implement a graph-based command buffer in which the execution graph can be assembled from queues, signals, packets, and barrier dispatches chained together.
- Different memory pools (e.g., coarse- and fine-grained) can be used to service allocations for different coherence guarantees.