llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL] Implement sycl_ext_oneapi_profiling_tag extension

Open steffenlarsen opened this issue 1 year ago • 6 comments

This commit adds the implementation of the sycl_ext_oneapi_profiling_tag extension. Moving the extension to experimental will happen in a follow-up patch.

steffenlarsen avatar Feb 27 '24 15:02 steffenlarsen

These changes are ready for review despite being in draft.

steffenlarsen avatar Mar 05 '24 16:03 steffenlarsen

Do you have expectations of what will happen when calling sycl::ext::oneapi::experimental::submit_profiling_tag(Queue) on a queue that's in the graph recording state from sycl_ext_oneapi_graph?

I think there's no new issue here. Even before this extension, an application could get an event from a command that was submitted to a recorded graph. The graph extension specification says that attempting to call event::get_profiling_info() throws an exception in such a case. The events returned by submit_profiling_tag would behave the same for a recorded queue.

We've got a PR that adds node level profiling to SYCL-Graph in #12592 and would be good to make sure the directions are aligned.

I don't recall seeing that one before, and I don't see @intel/dpcpp-specification-reviewers on the reviewer list. Should we wait to review?

gmlueck avatar Mar 11 '24 20:03 gmlueck

I think there's no new issue here. Even before this extension, an application could get an event from a command that was submitted to a recorded graph. The graph extension specification says that attempting to call event::get_profiling_info() throws an exception in such a case. The events returned by submit_profiling_tag would behave the same for a recorded queue.

That's a good point, I agree there shouldn't be an issue there.

I was also thinking that there is no equivalent to the piEnqueueTimestampRecordingExp entry-point in the UR command-buffer extension, so it's not just that we can't call event::get_profiling_info() on the event returned for that node in a queue recorded graph, but that we won't be able to create the node at all. Maybe we need to create an empty node as a placeholder so that the barrier semantics of submit_profiling_tag on an out-of-order queue are preserved. The graphs team could do this as follow on work like for other extensions that introduce queue enqueue commands, an exception about unknown command-group type should be thrown currently if trying to finalize a graph with a CG::CGTYPE::ProfilingTag command-group in it.

I don't recall seeing that one before, and I don't see @intel/dpcpp-specification-reviewers on the reviewer list. Should we wait to review?

The @intel/dpcpp-specification-reviewers group didn't get added automatically, I'll tag them now in that PR. I think it's worth reviewing now with any compatibility to the current sycl_ext_oneapi_profiling_tag design in mind, unless changes to sycl_ext_oneapi_profiling_tag are planned.

EwanC avatar Mar 12 '24 09:03 EwanC

https://github.com/oneapi-src/unified-runtime/pull/1400 has been now merged, please update this PR to point at the UR tag at:

commit 7ce68e09b47f6f7f7ad61b15c302a19462bbe887
Merge: 5a23b18e 84bad6c7
Author: Kenneth Benzie (Benie) <[email protected]>
Date:   Wed May 8 13:25:25 2024 +0100

    Merge pull request #1400 from steffenlarsen/steffen/record_event

    [UR][L0][CUDA][HIP] Add enqueue timestamp recording extension

kbenzie avatar May 08 '24 12:05 kbenzie

UR changes have been merged (though a small fix patch https://github.com/oneapi-src/unified-runtime/pull/1588 is needed) so this is ready for review.

Tag @intel/llvm-reviewers-cuda @intel/dpcpp-tools-reviewers @intel/dpcpp-nativecpu-pi-reviewers @intel/unified-runtime-reviewers

steffenlarsen avatar May 08 '24 13:05 steffenlarsen

@ldrumm - Friendly ping.

steffenlarsen avatar May 17 '24 12:05 steffenlarsen

@ldrumm - Friendly ping.

steffenlarsen avatar May 23 '24 06:05 steffenlarsen

@ldrumm - Friendly ping.

steffenlarsen avatar May 28 '24 08:05 steffenlarsen

Failure due to include of sycl/sycl.hpp addressed by https://github.com/intel/llvm/pull/13970.

steffenlarsen avatar May 30 '24 13:05 steffenlarsen

profiling_queue.cpp will be temporarily disabled for FPGA emulator while the failure is being investigated: https://github.com/intel/llvm/pull/13971

steffenlarsen avatar May 30 '24 13:05 steffenlarsen

Is there an estimated timeline for when this feature will be available in an upcoming Intel compiler release?

stgeke avatar Sep 17 '24 07:09 stgeke

Hi @stgeke! Thank you for the interest. I cannot comment on the timeline, but this is expected to be part of the next compiler release.

steffenlarsen avatar Sep 17 '24 08:09 steffenlarsen