pti-gpu icon indicating copy to clipboard operation
pti-gpu copied to clipboard

Discuss PTI library Callback API

Open jfedorov opened this issue 11 months ago • 5 comments

PTI plans to implement Callback functionality. Some work related to API IDs is already in progress. Please, provide your comments.

PTI Callback API

Overview

PTI Callback API allows a PTI user to register a callback function that is called at the enter and exit of a driver or runtime function call, as well as critical events happens in PTI itself.

To distinguish the APIs to which a user wants to set a callback, PTI introduces PTI Domains, API Groups and API IDs within API Groups. API Groups are to distinguish, for example, between different APIs that could serve as driver: OpenCL and Level-Zero. Ones API ID for a function is published in a PTI header file - it is constant and will never change.

Critical events are those that occur during profiling and either prevent profiling from continuing or cause significant data loss or corruption. Some of these events may be explicitly reported through return code from PTI functions. But others may occur during profiling and cannot be reported to the user via the return code. This is what the Callback domain PTI_DOMAIN_PTI_CRITICAL_EVENT is made for.

Data structures


typedef enum _pti_callback_domain {
   PTI_DOMAIN_INVALID = 0,
   PTI_DOMAIN_DRIVER_API = 1,
   PTI_DOMAIN_RUNTIME_API = 2,
   PTI_DOMAIN_PTI_CRITICAL_EVENT = 3
} pti_callback_domain;

typedef enum _pti_api_group_id {
  PTI_API_GROUP_RESERVED  = 0,
  PTI_API_GROUP_LEVELZERO = 1,
  PTI_API_GROUP_OPENCL    = 2,
  PTI_API_GROUP_SYCL      = 3,
} pti_api_group_id;

typedef enum _pti_callback_site {
  PTI_SITE_ENTER = 0, 
  PTI_SITE_EXIT = 1
} pti_callback_site;

typedef struct _pti_apicall_callback_data {
  pti_callback_site    _site;          // ENTER or EXIT
  const char*           _api_name;
  const void*          _args;          // pointer to the arguments passed to the function, 
                                                   // in case arguments not provided by PTI - it will be nullptr
  const char*          _kernel_name;   // valid only for DRIVER or RUNTIME APIs related to GPU kernel submission                                                 
  const void*          _return_code; // will be valid only for L0 API EXIT, for others will be nullptr
  pti_backend_ctx_t   _backend_context_handle; // L0 or OpenCL context
  uint32_t               _correlation_id; // ID that corresponds to the same call reported by View API records
  uint64_t*              _local_data;    // user data passed between ENTER and EXIT

} pti_apicall_callback_data;

typedef struct _pti_critical_event_callback_data {
   pti_result      _pti_event;
   char*            _message;   
} pti_critical_event_callback_data;

typedef void (*pti_callback_function)(
                  void* user_data,
                  pti_callback_domain  domain,
                  pti_api_group_id group_id, // For Driver domain it might be L0, OpenCL.. 
                  uint32_t      api_id,
                  void* cb_data);  // depending on the domain it should be type-casted to the pointer to either pti_apicall_callback_data or pti_critical_event_callback_data

Functions

pti_result ptiCallbackSubscribe(
                pti_callback_subscriber* subscriber,
                pti_callback_function    callback,
                void* user_data);

pti_result ptiCallbackUnsubscribe( pti_callback_subscriber subscriber);

// Enables/Disables callbacks to all APIs within domain
pti_result ptiCallbackEnableDomain(
                uint32_t enable, 
                pti_callback_subscriber subscriber,
                pti_callback_domain  domain);

// Enables/Disables callback for the specific functions within domain and api_group
pti_result ptiCallbackEnable(
                uint32_t enable, 
                pti_callback_subscriber subscriber,
                pti_api_group_id   api_group_id,   // one of L0, OpenCL, Sycl
                pti_api_id    api_id);

jfedorov avatar Jan 22 '25 00:01 jfedorov

As far as I understand the current code, there is no mechanism for host side callbacks at the moment, right? The general proposed idea looks like a good step forward.

If I understand correctly, a tool would need to do these two steps:

  • Provide a callback function
  • Register it via one or more of the callback functions / domains, while using the subscriber handle obtained during ptiCallbackSubscribe

Then, the tool would receive the enter and exit events for the registered handles. A tool could find out the actual function via the pti_callback_domain and pti_api_group_id, which could be used as keys to store e.g. a region handle for this function in a hashmap. The name, correlation id and such can be retrieved by converting the void pointer to the respective data type.

What I don't understand yet is how a tool would be able to parse _args. There's only a void* for this in this proposal, and a tool would need to know how to parse it. Honestly, what CUPTI has isn't ideal either, but at least works in that regard.

Thinking about Score-P for example, we do parse the amount of memory users try to allocate for certain calls. While this doesn't necessarily represent the actual memory being allocated, this is still an interesting metric for us. For this, we need to be able to parse the passed arguments, or obtain the information by any other mean (not in an activity buffer later on).

Thyre avatar Sep 17 '25 13:09 Thyre

Hi @Thyre, thanks for your interest. In fact - we are working full speed on callback API. And will submit soon PR for the discussion. It is based on the previous comments given by @jmellorcrummey (https://github.com/intel/pti-gpu/pull/87) This PR will be rather limited implementation but API is more complete.

Important to mention: at first we define and implement

  1. more generic domains, rather then simple callback on a specific L0 API. For example DOMAIN_APPEND would correspond to a number of L0 APIs which append operations (kernel or memory op) to command list
  2. data delivered into such callback are "processed", rather then pass "raw" arguments of L0 APIs. E.g. instead of pointer to kernel - there will be unique operation id and kernel id, that could be referred/matched with other records delivered by PTI (e.g. with View records)
  3. some callbacks to generic domains can be used as a way to extend PTI collector.

We've also got your requirement that you need to get all arguments from specific call to extract some useful information. And we can provide a mechanism for accessing all arguments of API.

The challenge is how to make work generic domains together API specific domains consistently and without confusion. (?)

  • if user enables generic domain and API specific domain- should it get at least two callbacks for same L0 API?
  • what about modification PTI flow through API specific domain? may be we should prohibit this? and or rather rely on a user that "he/she know what is doing"

It would be great to hear your opinion on this.

jfedorov avatar Sep 19 '25 11:09 jfedorov

Thanks for the quick feedback 😄 It's great to hear that you're working on this part of PTI-SDK and I'm looking forward to the results.

Important to mention: at first we define and implement

  1. more generic domains, rather then simple callback on a specific L0 API. For example DOMAIN_APPEND would correspond to a number of L0 APIs which append operations (kernel or memory op) to command list

  2. data delivered into such callback are "processed", rather then pass "raw" arguments of L0 APIs. E.g. instead of pointer to kernel - there will be unique operation id and kernel id, that could be referred/matched with other records delivered by PTI (e.g. with View records)

  3. some callbacks to generic domains can be used as a way to extend PTI collector.

I think these are valid approaches. Combining a set of API calls into domains (or groups) is a good idea, and helps in handling these kind of domains more easily. This may remove the need to basically parse all the different calls an API provides just to retrieve some arguments (e.g. the amount of device memory requested in an allocation call).

Can a tool still get the e.g. L0 API name with this generic domain approach (e.g. by having the API ID and retrieving it via a function call)? If so, that would be perfect. If not, the amount of information for the user could be a bit more limited, but one could work with that.

Not having raw data is fine for the most part I'd say. Most arguments aren't that interesting for us. However, things like unique operation id, kernel id, allocated memory and so on are. I'll keep an eye on your PR and provide feedback once that is open 😄

We've also got your requirement that you need to get all arguments from specific call to extract some useful information. And we can provide a mechanism for accessing all arguments of API.

The challenge is how to make work generic domains together API specific domains consistently and without confusion. (?)

* if user enables **generic domain** and **API specific domain**- should it get at least two callbacks for same L0 API?

* what about modification PTI flow through **API specific domain**? may be we should prohibit this? and or rather rely on a user that "he/she know what is doing"

It would be great to hear your opinion on this.

I don't think that we should allow modifying the flow, or even the passed user parameters, by a tool. There are APIs where this is possible, e.g. MPI/PMPI and maybe there are tools domains where this is interesting. However, I think that this complicates the implementation quite a bit.

Regarding having generic and API specific domains, I'd tend to only allowing one to be active at the same time. This would reduce the amount of issues that can come up. I don't think that tools are particularly interested in having both active at the same time anyway. With this, you would have the option to work on the fully generic domain first, and add API specific domains later if desired.

Thyre avatar Sep 19 '25 11:09 Thyre

Can a tool still get the e.g. L0 API name with this generic domain approach (e.g. by having the API ID and retrieving it via a function call)? If so, that would be perfect. If not, the amount of information for the user could be a bit more limited, but one could work with that.

Yes, API id is part of the callback data. these IDs are already defined here There is an API to retrieve the name by ID.

thanks a lot for the feedback!

I will try to submit what is already available as a draft PR - so you can review all what is there

jfedorov avatar Sep 19 '25 12:09 jfedorov

Tagging @yuninxia, who has built support for PC sampling in a branch of HPCToolkit using the metrics API. He wants to follow these discussions.

jmellorcrummey avatar Sep 19 '25 22:09 jmellorcrummey