unified-runtime issues

[UR] Improve handling of error cases in urProgramLink

1

Note that this change includes a specification change: urProgramLink now requires the output parameter to contain either nullptr or some unspecified binary on failure. As well as this change, a...

RossBrunton

loader

conformance

specification

experimental

level-zero

cuda

hip

opencl

native-cpu

Add CommandListCache abstraction to context

9

igchor

level-zero

[DeviceSanitizer] refactor options handling and fix use-after-free related problems

1

This patch: - refactor options handling. - for use-after-free, do not try to get allocated/released info when quarantine is not enabled(no such info anyway). - for findAllocInfoByAddress(), add an assertion...

yingcong-wu

loader

sanitizer

[DeviceSanitizer] Change ASan shadow scale from 3 to 4

L0 GPU runtime will divide the device memory address space equally among the all gpu devices. So, if there are multiple gpu devices, device sanitizer may not be able to...

zhaomaosu

loader

sanitizer

Implement urKernelGetSuggestedLocalWorkSize

29

This PR try to implement the API `urKernelGetSuggestedLocalWorkSize`, discussed in https://github.com/oneapi-src/unified-runtime/issues/1270. SYCLOS PR: https://github.com/intel/llvm/pull/12902 Also fix: - For Level-Zero: when `LocalWorkSize` is provided, `urEnqueueKernelLaunch()` will read `LocalWorkSize` without respecting `workDim`.

yingcong-wu

loader

conformance

specification

level-zero

cuda

hip

opencl

ready to merge

native-cpu

sanitizer

[L0] Refactoring of boolean event parameters

4

CI in LLVM/SYCL: https://github.com/intel/llvm/pull/14536

winstonzhang-intel

level-zero

[CUDA][HIP] Minimize native events recorded and created by urEnqueueTimestampRecordingExp

1

Since EvStart and EvEnd are recorded directly after one another in `urEnqueueTimestampRecordingExp`, we can just copy EvStart to make EvEnd, instead of calling cuEventRecord for both `EvStart` and `EvEnd`, one...

hdelan

cuda

hip

Ignore particular files for CI runs

4

I recently created two PRs (#1508 and #1509) which were simple changes to [.github/labeler.yml](https://github.com/oneapi-src/unified-runtime/blob/main/.github/labeler.yml) which triggered the full CI pipeline. These have no effect on UR spec/source code at all...

martygrant

ci/cd

cudaOccupancyMaxActiveBlocksPerMultiprocessor

2

Is there a SYCL function for cudaOccupancyMaxActiveBlocksPerMultiprocessor ? some use cases are listed below. Thanks. AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h: result = cudaOccupancyMaxActiveBlocksPerMultiprocessor( AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h: cudart_result = cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags( AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h: CUTLASS_TRACE_HOST(" cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags() returned error "

jinz2014

cuda

Correct level of indirection used in KernelSetArgPointer calls.

6

Also attempt to clarify the wording around this a bit. Addresses #558 LLVM testing https://github.com/intel/llvm/pull/12270

aarongreig

loader

conformance

specification

level-zero

cuda

hip

opencl

native-cpu

sanitizer

unified-runtime
unified-runtime copied to clipboard

Metadata

[UR] Improve handling of error cases in urProgramLink

Add CommandListCache abstraction to context

[DeviceSanitizer] refactor options handling and fix use-after-free related problems

[DeviceSanitizer] Change ASan shadow scale from 3 to 4

Implement urKernelGetSuggestedLocalWorkSize

[L0] Refactoring of boolean event parameters

[CUDA][HIP] Minimize native events recorded and created by urEnqueueTimestampRecordingExp

Ignore particular files for CI runs

cudaOccupancyMaxActiveBlocksPerMultiprocessor

Correct level of indirection used in KernelSetArgPointer calls.

← Metadata

Owner

Metadata

unified-runtime unified-runtime copied to clipboard

Metadata

← Metadata

Owner

Metadata

unified-runtime
unified-runtime copied to clipboard