llvm
llvm copied to clipboard
[SYCL] Implement sycl_ext_oneapi_profiling_tag extension
This commit adds the implementation of the sycl_ext_oneapi_profiling_tag extension. Moving the extension to experimental will happen in a follow-up patch.
These changes are ready for review despite being in draft.
Do you have expectations of what will happen when calling
sycl::ext::oneapi::experimental::submit_profiling_tag(Queue)on a queue that's in the graph recording state from sycl_ext_oneapi_graph?
I think there's no new issue here. Even before this extension, an application could get an event from a command that was submitted to a recorded graph. The graph extension specification says that attempting to call event::get_profiling_info() throws an exception in such a case. The events returned by submit_profiling_tag would behave the same for a recorded queue.
We've got a PR that adds node level profiling to SYCL-Graph in #12592 and would be good to make sure the directions are aligned.
I don't recall seeing that one before, and I don't see @intel/dpcpp-specification-reviewers on the reviewer list. Should we wait to review?
I think there's no new issue here. Even before this extension, an application could get an event from a command that was submitted to a recorded graph. The graph extension specification says that attempting to call
event::get_profiling_info()throws an exception in such a case. The events returned bysubmit_profiling_tagwould behave the same for a recorded queue.
That's a good point, I agree there shouldn't be an issue there.
I was also thinking that there is no equivalent to the piEnqueueTimestampRecordingExp entry-point in the UR command-buffer extension, so it's not just that we can't call event::get_profiling_info() on the event returned for that node in a queue recorded graph, but that we won't be able to create the node at all. Maybe we need to create an empty node as a placeholder so that the barrier semantics of submit_profiling_tag on an out-of-order queue are preserved. The graphs team could do this as follow on work like for other extensions that introduce queue enqueue commands, an exception about unknown command-group type should be thrown currently if trying to finalize a graph with a CG::CGTYPE::ProfilingTag command-group in it.
I don't recall seeing that one before, and I don't see @intel/dpcpp-specification-reviewers on the reviewer list. Should we wait to review?
The @intel/dpcpp-specification-reviewers group didn't get added automatically, I'll tag them now in that PR. I think it's worth reviewing now with any compatibility to the current sycl_ext_oneapi_profiling_tag design in mind, unless changes to sycl_ext_oneapi_profiling_tag are planned.
https://github.com/oneapi-src/unified-runtime/pull/1400 has been now merged, please update this PR to point at the UR tag at:
commit 7ce68e09b47f6f7f7ad61b15c302a19462bbe887
Merge: 5a23b18e 84bad6c7
Author: Kenneth Benzie (Benie) <[email protected]>
Date: Wed May 8 13:25:25 2024 +0100
Merge pull request #1400 from steffenlarsen/steffen/record_event
[UR][L0][CUDA][HIP] Add enqueue timestamp recording extension
UR changes have been merged (though a small fix patch https://github.com/oneapi-src/unified-runtime/pull/1588 is needed) so this is ready for review.
Tag @intel/llvm-reviewers-cuda @intel/dpcpp-tools-reviewers @intel/dpcpp-nativecpu-pi-reviewers @intel/unified-runtime-reviewers
@ldrumm - Friendly ping.
@ldrumm - Friendly ping.
@ldrumm - Friendly ping.
Failure due to include of sycl/sycl.hpp addressed by https://github.com/intel/llvm/pull/13970.
profiling_queue.cpp will be temporarily disabled for FPGA emulator while the failure is being investigated: https://github.com/intel/llvm/pull/13971
Is there an estimated timeline for when this feature will be available in an upcoming Intel compiler release?
Hi @stgeke! Thank you for the interest. I cannot comment on the timeline, but this is expected to be part of the next compiler release.