pti-gpu icon indicating copy to clipboard operation
pti-gpu copied to clipboard

[PTI-LIB] Define Callback API and make it work for two domains

Open jfedorov opened this issue 3 months ago • 3 comments

Description

This is an initial implementation of PTI Synchronous Callback API. Current implementation serves to implement on top of it PTIMetricsScope API (will be added soon) - to collect hardware metrics per individual GPU operations via Event Query mechanism.

This PR

  • implement for 2 domains only, for immediate command list only
  • make callback sample
  • add sample to tests and run asan and tsan on it

Area of the change

  • [x] PTI SDK
  • [ ] Unitrace
  • [ ] Other Tool(s) - e.g. Sysmon, or any code above SDK directory (except Unitrace)
  • [ ] Infrastructure - e.g. GitHub workflows, and other whole repo impacted

Type(s) of change

Choose one or multiple, leave empty if none of the other choices apply

  • [ ] Bug fix - change that fixes an issue
  • [x] New feature - change that adds functionality
  • [ ] Performance improvement - change that lowers profiling overhead
  • [ ] Tests - change in tests
  • [x] Samples - change in samples
  • [ ] Documentation - documentation update

Tests

  • [ ] Added - required for new features and some bug fixes Tests will be added in the next PR
  • [ ] Not needed

Specific HW and OS where to run the test unless generic:

For example, 2 discrete GPUs, integrated GPU, specific GPU model, PyTorch integration test(s)

Checklist

  • [x] Have all tests, except Quarantined, passed locally?
  • [ ] Do all newly added source files have a license header?

Details on API(s) or command line option(s) changes

  • [ ] API(s) or command line options not changed
  • [x] New API or command line options added
  • [ ] Existing API(s) or command line options changed - so backward compatibility broken
  • [ ] Unknown

If applies - details on the broken backward compatibility

Indicate what API(s) backward compatibility or option(s) is broken, why it might be OK, or suggest on how to deal with it moving forward

Notify the following users

@jmellorcrummey , @Thyre , @anmyachev, @yuninxia @mschilling0, @Rogersyp

Other information

jfedorov avatar Sep 22 '25 10:09 jfedorov

@Thyre , thank you for your comments. I might not be able to answer /address them all today/tomorrow. But as I will be back - I will go overall of these and others, May be in a short term @mschilling0 and @Rogersyp can comment.

jfedorov avatar Sep 22 '25 12:09 jfedorov

@Thyre , thank you for your comments. I might not be able to answer /address them all today/tomorrow. But as I will be back -

There's no need to rush things :smile: ( at least not from my side ) Thanks again for accepting the feedback on this at all. I think this helps everyone at the end.

Thyre avatar Sep 22 '25 15:09 Thyre

@Thyre , thank you for your comments. I might not be able to answer /address them all today/tomorrow. But as I will be back - I will go overall of these and others, May be in a short term @mschilling0 and @Rogersyp can comment.

I left some comments internally, I'll take some of the API comments and put them here.

mschilling0 avatar Sep 26 '25 17:09 mschilling0