llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL][Graph] Deadlock when using ext_oneapi_set_external_event to transition queue into recording mode

Open mmichel11 opened this issue 2 months ago • 0 comments

Initially reported by @slawekptak

Describe the bug

Setting external events onto an in-order queue from a queue that is a recording to a queue that is not recording to cause a transition into recording mode results in a deadlock. This is caused by the queue lock attempted to be acquired twice the implementation. It should be fixed to enable graph compatibility with this extension.

To reproduce

The below code snippet can be used to reproduce the issue and will hang during the submission to Q2.

// Compilation clang++ -fsycl transitive_set_external_event.cpp -o transitive_set_external_event
#include <sycl/sycl.hpp>

#include <cassert>
#include <iostream>
#include <numeric>
#include <vector>

using namespace sycl;
namespace exp_ext = ext::oneapi::experimental;

int main() {
  constexpr size_t Size = 128;

  device Dev = device::get_devices()[0];
  context Ctx{Dev};

  queue Q1{Ctx, Dev, {property::queue::in_order{}}};
  queue Q2{Ctx, Dev, {property::queue::in_order{}}};

  std::vector<int> HostA(Size), HostB(Size);
  std::iota(HostA.begin(), HostA.end(), 1);
  std::iota(HostB.begin(), HostB.end(), 100);

  int *A = malloc_device<int>(Size, Q1);
  int *B = malloc_device<int>(Size, Q1);

  Q1.copy(HostA.data(), A, Size);
  Q1.copy(HostB.data(), B, Size);
  Q1.wait_and_throw();

  exp_ext::command_graph Graph{Ctx, Dev};

  // Begin recording on Q1
  Graph.begin_recording(Q1);

  // Submit a small kernel on Q1 that increments A
  auto E1 = Q1.submit([&](handler &h) {
    h.parallel_for(range<1>{Size}, [=](id<1> i) { A[i] += 1; });
  });
 
  // Set external event to depend on E1 for Q2.
  Q2.ext_oneapi_set_external_event(E1);

  // Submissions to Q2 should be considered part of the same graph due to
  // the external event linking into recording mode. Submit a kernel on Q2
  // that multiplies B by 2 and depends implicitly on the external event.
  auto E2 = Q2.submit([&](handler &h) {
    h.parallel_for(range<1>{Size}, [=](id<1> i) { B[i] *= 2; });
  });
  
  Graph.end_recording();

  // Finalize the graph into an executable
  auto Exec = Graph.finalize();

  Q1.ext_oneapi_graph(Exec);
  Q1.wait_and_throw();

  // Copy results back to host for verification
  std::vector<int> OutA(Size), OutB(Size);
  Q1.copy(A, OutA.data(), Size);
  Q1.copy(B, OutB.data(), Size);
  Q1.wait_and_throw();

  // Verify expected result computed on host
  for (size_t i = 0; i < Size; ++i) {
    int expectedA = HostA[i] + 1;
    int expectedB = HostB[i] * 2;
    if (OutA[i] != expectedA || OutB[i] != expectedB) {
      std::cerr << "Mismatch at " << i << ": got (" << OutA[i] << ", "
                << OutB[i] << ") expected (" << expectedA
                << ", " << expectedB << ")\n";
      return 1;
    }
  }

  // Cleanup
  free(A, Q1);
  free(B, Q1);

  std::cout << "PASS\n";
  return 0;
}

Environment

  • DPC++ version: produced with 29435fc9e1be
  • Other environment details not relevant for producing deadlock

Additional context

No response

mmichel11 avatar Nov 05 '25 15:11 mmichel11