cpython Add API to allow extensions to set callback function on creation, modification, and destruction of PyFunctionObject

BPO 46897

Nosy @carljm, @DinoV, @itamaro, @mpage

BPO	46897
Nosy	@carljm, @DinoV, @itamaro, @mpage

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-03-01.22:19:44.618>
labels = ['expert-C-API', 'type-feature', '3.11']
title = 'Add API to allow extensions to set callback function on creation, modification, and destruction of PyFunctionObject'
updated_at = <Date 2022-03-01.22:19:44.618>
user = 'https://github.com/mpage'

bugs.python.org fields:

activity = <Date 2022-03-01.22:19:44.618>
actor = 'mpage'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['C API']
creation = <Date 2022-03-01.22:19:44.618>
creator = 'mpage'
dependencies = []
files = []
hgrepos = []
issue_num = 46897
keywords = []
message_count = 1.0
messages = ['414308']
nosy_count = 4.0
nosy_names = ['carljm', 'dino.viehland', 'itamaro', 'mpage']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue46897'
versions = ['Python 3.11']

PR: gh-98175

Mar 01 '22 22:03 f820ac6c-889b-43b7-b077-21b10aeb370a

CPython extensions providing optimized execution of Python bytecode (e.g. the Cinder JIT) may need to hook into the lifecycle of function objects to determine what to optimize, invalidate previously-optimized functions, or free resources allocated for functions that no longer exist. For example, when inlining a function, the Cinder JIT will use the bytecode of the inlined function that was known at compile-time. If the bytecode for the inlined function changes at runtime (i.e. if __code__ was reassigned) the JIT needs to invalidate any code into which the function was inlined. We propose adding an API to allow extensions to set callbacks that will be invoked whenever functions are created, modified, or destroyed.

Proposed API:

typedef enum {
  PYFUNC_LCEVT_CREATED,
  PYFUNC_LCEVT_MODIFIED,
  PYFUNC_LCEVT_DESTROYED
} PyFunction_LifecycleEvent;

typedef enum {
  PYFUNC_ATTR_CODE,
  PYFUNC_ATTR_GLOBALS,
  PYFUNC_ATTR_DEFAULTS,
  PYFUNC_ATTR_KWDEFAULTS,
  PYFUNC_ATTR_CLOSURE,
  PYFUNC_ATTR_NOT_APPLICABLE,
} PyFunction_AttrId;

// A callback to be called in response to events in a function's lifecycle.
//
// The callback is invoked after a function is created and before the function 
// is modified or destroyed.
//
// On modification the third argument indicates which attribute was modified
// and the fourth argument is the new value.
// Otherwise the third argument is PYFUNC_ATTR_NOT_APPLICABLE and the fourth
// argument is NULL.
typedef void(*PyFunction_LifecycleCallback)(
  PyFunction_LifecycleEvent event, 
  PyFunctionObject* func,
  PyFunction_AttrId attr,
  PyObject* new_value);

void PyFunction_SetLifecycleCallback(PyFunction_LifecycleCallback callback);
PyFunction_LifecycleCallback PyFunction_GetLifecycleCallback();

Mar 01 '22 22:03 f820ac6c-889b-43b7-b077-21b10aeb370a

@markshannon How does this proposal from @mpage fit into our current plans?

May 25 '22 21:05 gvanrossum

First we need to do something about comprehensions (and nested functions). Function objects are created for each comprehension. Ideally we would not create functions for list comprehensions, or treat them specially, as the function objects are inaccessible. Closures are trickier, as they can be ephemeral, but are accessible.

Provided we can avoid hooking into the lifetime of these short live objects, then adding hooks makes sense. We will need to handle much the same set of events as Cinder does.

I'd like to add these hooks in a principled way in the broader context of handling potential de-optimization events. Maybe extending the API for dictionary watchers?

Jul 22 '22 14:07 markshannon

We have code in cinder's compiler to inline comprehensions instead of creating a function. It is a perf win but there is a semantic compromise in scoping / name visibility, not sure that would be acceptable.

Can you outline what you're thinking in terms of unified API? My intuition is that the needs and details of eg dict watching vs function watching are sufficiently different that separate APIs in PyDict_* and PyFunction_* will probably be simpler and clearer to use, but open to suggestions.

Jul 31 '22 22:07 carljm

Ping? @markshannon ?

Sep 15 '22 03:09 gvanrossum

We have code in cinder's compiler to inline comprehensions instead of creating a function. It is a perf win but there is a semantic compromise in scoping / name visibility, not sure that would be acceptable.

I doubt that it would be in general, but for cases where the iteration variable is not shadowed or used outside the comprehension it might be. It would need a wider discussion.

Can you outline what you're thinking in terms of unified API?

It is a bit vague at the moment, but I do need to write it up properly. For now, I'm thinking that any object that is depended on, whether dict, function or class would be allocated an ID. Optimized code would depend on a set of IDs. If any object with an ID changes then the associated optimized code(s) would be invalidated. It should be possible to implement this reasonably efficiently with bloom filters, or radix trees, or some other suitable data structure.

How does Cinder handle this?

Sep 15 '22 09:09 markshannon

Why do functions need callbacks, rather than code objects?

Once we have implemented https://github.com/faster-cpython/ideas/issues/446, we can effectively specialize calls to comprehensions and nested functions, without caring about the lifetimes of the individual function objects.

Sep 15 '22 09:09 markshannon

I'm thinking that any object that is depended on, whether dict, function or class would be allocated an ID. Optimized code would depend on a set of IDs. If any object with an ID changes then the associated optimized code(s) would be invalidated.

I don't think this type of API will be sufficient for us in general. We want to be able to do code-patching on many updates, not just invalidate all generated code on any change. One reason this matters is because it handles the problem of OSR for functions further up the stack. If, say, a global value changes, we want to patch the generated code at the point where that specific global value is loaded with an unconditional deopt instruction, which will result in the correct behavior even if that optimized function is already mid-execution somewhere up the stack. So e.g. in the dict watchers API, we need details about what changed in the dict, not just the fact that the dict changed. Another example is that we do granular invalidation of our "inline" caches if the type whose information we cached is changed, we won't just throw away the generated code (which is independent of the caches and is still valid.)

How does Cinder handle this?

We use callback functions on relevant modifications to dictionaries, types, and functions, with custom handling appropriate to each case.

Why do functions need callbacks, rather than code objects?

I think the only case in which we actually depend on func-modified hooks today is if the function's __code__ is changed. Because we change a function's vectorcall entrypoint when we compile it, if its __code__ changes we need to reset to the default vectorcall entrypoint. I'm not sure if we have use cases in mind for hooking into other changes to funcs; maybe @mpage or @swtaarrs can weigh in if I'm missing something. We do modify MAKE_FUNCTION today, which in the future we might want to use a func-created hook for instead.

Sep 15 '22 14:09 carljm

cpython cpython copied to clipboard

Add API to allow extensions to set callback function on creation, modification, and destruction of PyFunctionObject

cpython
cpython copied to clipboard