onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Add OpenCL EP

Open luyhcsu opened this issue 10 months ago • 8 comments

Description

Description: This PR adds OpenCL execution provider support. Enable ONNX model execution across various accelerators by leveraging OpenCL support. 1.Enhance OpenCL EP support based on OpenCL 1.2 standard. 2.Add basic operator support for LLMs, such as Qwen2_5 and Llama2_7B_Chat. 3.Add C/C++、Python API support.

Motivation and Context

luyhcsu avatar Feb 27 '25 02:02 luyhcsu

Any maintainer can apporve the CI workflows? Thanks

luyhcsu avatar Mar 17 '25 01:03 luyhcsu

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,ONNX Runtime Web CI Pipeline,Windows ARM64 QNN CI Pipeline

justinchuby avatar Apr 17 '25 21:04 justinchuby

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

azure-pipelines[bot] avatar Apr 17 '25 21:04 azure-pipelines[bot]

/azp run Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU CUDA CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Linux Android Emulator QNN CI Pipeline,Big Models

justinchuby avatar Apr 17 '25 21:04 justinchuby

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

azure-pipelines[bot] avatar Apr 17 '25 21:04 azure-pipelines[bot]

Could you rebase from main?

justinchuby avatar Apr 17 '25 21:04 justinchuby

@luyhcsu Are you still aiming to merge this in? If not, do you mind if someone else picks it up?

mitchelldehaven avatar Jun 09 '25 17:06 mitchelldehaven

@luyhcsu Are you still aiming to merge this in? If not, do you mind if someone else picks it up?

Appreciate you checking in! I do plan to finish this—just need a bit more time to tidy things up. Totally open to others jumping in with suggestions or improvements too.

luyhcsu avatar Jun 10 '25 10:06 luyhcsu

It should be pointed out this is a continuous effort based #10830. I am the original feature branch main developer during my work at Microsoft (not anymore).

Let me add some context for this.

  1. Almost every OpenCL C compilers are bug ridden c compilers embedded in the kernel driver.
    • except for the compiler from NVIDIA, not /s!
    • be prepared for the workarounds!
  2. The pointer and opaque Texture/Image type design was not finished. There is no cached allocator for Texture/Image backed opaque tensor, only a very simple one with LRU cache.
    • This is the main pain point.
    • Some hardware vendor do not know how to properly design a GPU (ARM! the Mali GPU). The shared memory (in CUDA context, or local memory in OpenCL context) is backed by DRAM! Programmers lose the programmer controlled cache. So it is impossible to design high performance kernels with flat pointer. So to get some cache ability, some cleverly use the texture cache from texture unit, but it is awful to encode and decode the tensor coords.
  3. The original branch was not merged because we came to the conclusion that there is no enough man-power (FTE) to maintain another EP.

cloudhan avatar Jun 24 '25 09:06 cloudhan