unified-runtime
unified-runtime copied to clipboard
Note that this change includes a specification change: urProgramLink now requires the output parameter to contain either nullptr or some unspecified binary on failure. As well as this change, a...
This patch: - refactor options handling. - for use-after-free, do not try to get allocated/released info when quarantine is not enabled(no such info anyway). - for findAllocInfoByAddress(), add an assertion...
L0 GPU runtime will divide the device memory address space equally among the all gpu devices. So, if there are multiple gpu devices, device sanitizer may not be able to...
This PR try to implement the API `urKernelGetSuggestedLocalWorkSize`, discussed in https://github.com/oneapi-src/unified-runtime/issues/1270. SYCLOS PR: https://github.com/intel/llvm/pull/12902 Also fix: - For Level-Zero: when `LocalWorkSize` is provided, `urEnqueueKernelLaunch()` will read `LocalWorkSize` without respecting `workDim`.
CI in LLVM/SYCL: https://github.com/intel/llvm/pull/14536
Since EvStart and EvEnd are recorded directly after one another in `urEnqueueTimestampRecordingExp`, we can just copy EvStart to make EvEnd, instead of calling cuEventRecord for both `EvStart` and `EvEnd`, one...
I recently created two PRs (#1508 and #1509) which were simple changes to [.github/labeler.yml](https://github.com/oneapi-src/unified-runtime/blob/main/.github/labeler.yml) which triggered the full CI pipeline. These have no effect on UR spec/source code at all...
Is there a SYCL function for cudaOccupancyMaxActiveBlocksPerMultiprocessor ? some use cases are listed below. Thanks. AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h: result = cudaOccupancyMaxActiveBlocksPerMultiprocessor( AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h: cudart_result = cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags( AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h: CUTLASS_TRACE_HOST(" cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags() returned error "
Also attempt to clarify the wording around this a bit. Addresses #558 LLVM testing https://github.com/intel/llvm/pull/12270