llvm [SYCL][Graph] Optimize graph enqueue for in-order queues

Optimizes the enqueue() function of sycl graphs to bypass the scheduler whenever possible and avoid creating events when not needed.

Refactors the executable graph enqueue() to have different paths depending on workload:
- The direct path will be used when there are no host-tasks or accessor requirements in the graph and the execution dependencies are considered safe to bypass the scheduler.
- The scheduler path will be used when there are requirements in the graph but no host-tasks or, if the execution dependencies require using the scheduler.
- The multiple partitions path will be used when the graph contains host-tasks which requires scheduling multiple graph partitions. The implementation was also changed to avoid adding unnecessary event dependencies to partition executions and avoiding copying CGData when possible.
Extends the changes in https://github.com/intel/llvm/pull/18277 to sycl graphs. This means that no implicit events will be created when using in-order queues and graphs without host-tasks. Also updates the handler to only request events from the graph enqueue() when they are needed.

Jun 03 '25 16:06 fabiomestre

@cperkinsintel , final review is on you:

Jun 12 '25 17:06 aelovikov-intel

@intel/llvm-gatekeepers Can this PR be merged? The existing failures seem to be CI issues. This PR was green for those jobs before:

Intel-Arc jobs:

Cuda UR Job:

There were only unrelated changes to the HIP UR adapter made since then.

Edit: Nevermind, recently merged commits broke this PR, needs to be rebased.

Jun 17 '25 12:06 fabiomestre

@intel/llvm-gatekeepers This PR is ready to merge. The CI failure on PVC is unrelated (I have seen it in other PR's).

Jun 18 '25 18:06 fabiomestre

@intel/llvm-gatekeepers This PR is ready to merge. The CI failure on PVC is unrelated (I have seen it in other PR's).

Failing tests are unrelated and tracked by https://github.com/intel/llvm/issues/18932

Jun 18 '25 19:06 uditagarwal97