hpy icon indicating copy to clipboard operation
hpy copied to clipboard

Exposing APIs from HPy extensions to be used by other HPy extensions

Open steve-s opened this issue 2 years ago • 9 comments

The motivating example is the NumPy API that is exposed to other Python extensions such that they can work with arrays natively/directly without a round-trip through Python code/abstractions.

How the NumPy API works at the moment:

  • NumPy provides a header file with a definition of a struct that holds pointers to some objects (e.g., array type), and some API functions, this is similar to HPyContext
  • NumPy exposes a PyCapsule with a pointer to this struct filled with pointers to the implementation
  • 3rd party extension includes the NumPy header file, fishes the PyCapsule from NumPy, gets the raw C pointer from it and uses it to call the NumPy API through the struct

The very same scheme can work with HPy, but has one drawback: the 3rd party extension gets some HPyContext and passes it to NumPy, which means:

  • NumPy must be built for ABI compatible HPyContext version (could be lower minor version, because those are binary compatible)
  • Before this the Python VM could (for some optimization/implementation reason) send different HPyContext instance to different packages (it can store module state in it, for example). With HPyContext flowing from one extension to another, this is no longer possible.
  • In general it may be useful to be able to intercept and control the communication between extensions

Are those restrictions problematic enough to seek a better solution?

One possibility is to provide some way to "wrap" function pointers with a trampoline that can "transform" the HPyContext to another if necessary. Example in code:

// NumPy:
HPy my_api_function(HPyContext *ctx, HPy h) { ... }
// ...
numpy_api_capsule->my_api_function_pointer = HPy_AsAPI(ctx, &my_api_function);

// 3rd party using the API to call the function:
numpy_api_capsule->my_api_function_pointer(my_hpy_context, my_handle);

// HPy universal implementation of the generated trampoline would be:

HPy_API_token numpy_token; // implementation specific: 
// a pointer to anything the implementation needs, initialized in the HPy_AsAPI call

HPy my_api_function_trampoline(HPyContext *caller_ctx, HPy h) {
    HPyContext *numpyCtx = _HPy_TransformContext(caller_ctx, numpy_token); // part of ABI, not API
    my_api_function(numpyCtx, h);
}

Question is how to generate the trampoline. We can use macros for that, something like HPy_APIDef(...). As a bonus we could generate CPython API trampolines, so that the API can be usable from non-HPy packages (NumPy would have to expose another capsule with the CPython trampolines to be used by non-HPy packages).

steve-s avatar Jul 25 '23 10:07 steve-s

Are those restrictions problematic enough to seek a better solution?

IMO, we definitively need some interception. I can add following point:

  • If module A uses module B (e.g. Pandas uses NumPy) and B was loaded in debug or trace mode but A wasn't, then passing on the HPyContext would also mean that you would use a different run mode.

It may be the case that it is fine to pass the HPyContext to the next module but I think we shouldn't assume that in general.

One possibility is to provide some way to "wrap" function pointers with a trampoline that can "transform" the HPyContext to another if necessary.

Sounds good to me. I'm just not so sure about this:

numpy_api_capsule->my_api_function_pointer = HPy_AsAPI(ctx, &my_api_function);

Would HPy_AsAPI return the function pointer of the trampoline (i.e. my_api_function_trampoline in the above example)? If so, a macro like the suggested HPy_APIDef would certainly generate some kind of definition (just like HPyDef_METH or similar) and we would pass the definition to HPy_AsAPI.

fangerer avatar Jul 25 '23 12:07 fangerer

Would HPy_AsAPI return the function pointer of the trampoline (i.e. my_api_function_trampoline in the above example)? If so, a macro like the suggested HPy_APIDef would certainly generate some kind of definition (just like HPyDef_METH or similar) and we would pass the definition to HPy_AsAPI.

Good point. Yes, we should probably do the exactly same thing as with HPyDef_METH -- it would generate a struct and one would pass that to HPy_AsAPI, or maybe HPy_GetAPI.

steve-s avatar Jul 25 '23 14:07 steve-s

Packages from top4000 with string "PyArrayObject" in their sources:

asammdf astropy Bottleneck cvxpy dedupe ecos fastcluster GDAL matplotlib numba numexpr numpy opencv osqp pandas pyerfa python scipy scs shap Theano

Do we know of any other package that exposes some C API? I looked at pandas, they don't have it. What is NumPy's take on its C API: should people be ideally using the memory view and other generic means over the NumPy's C API? If that was the case, we could also say that exposing own C APIs is something that should not be done and hence is not supported in HPy.

steve-s avatar Aug 03 '23 08:08 steve-s

What is NumPy's take on its C API: should people be ideally using the memory view and other generic means over the NumPy's C API?

I would assume that since there is the array API and NumPy implements it (https://numpy.org/doc/stable/reference/c-api/array.html), NumPy's take is not necessarily to use memory view. But I don't know.

fangerer avatar Aug 03 '23 09:08 fangerer

Isn't that API on the Python level?

steve-s avatar Aug 03 '23 09:08 steve-s

It would be nice if people used the dlpack interface, which provides a standard way to interacts with array-like objects. But thinking about this more deeply it seems that if the HPy port of NumPy must export some kind of C-API, it would still have to be able to export exactly the CPython PyArrayObject. Refactoring code like this from matplotlib to avoid the NumPy C-API (with PyArrayObject) is not going to be easy, it would require replacing their numpy::array_view c++ class with something else, or at least rethinking all the incref/decref in that class.

So if we are confined to use PyArrayObject, can we export that from an HPy port of NumPy without using legacy mode?

mattip avatar Aug 03 '23 10:08 mattip

Note all the dlpack interface requires is capsule support, which HPy has.

mattip avatar Aug 03 '23 10:08 mattip

Cython does also contain a system for exposing your types/functions as API, via automatic capsule use. But it also has internal shared code capabilities. If you import multiple Cython modules (transpiled with the same version), they'll share the implementation of the custom function type, things like that.

TeamSpen210 avatar Aug 03 '23 21:08 TeamSpen210

Related discussion: https://discuss.python.org/t/changing-the-pycapsule-api-to-better-support-versions/54860

steve-s avatar Jun 11 '24 10:06 steve-s