dlpack icon indicating copy to clipboard operation
dlpack copied to clipboard

[RFC] DLPack C Exchange API Attribute to PyCapsule

Open tqchen opened this issue 3 weeks ago • 20 comments

Background in https://github.com/dmlc/dlpack/issues/175 we introduced the C exchange API. In the original discussion we have considered proposals of PyCapsule as field and in dunder. We discussed the choice array-api meeting session where and I think everyone agreed both path can work, but then eventually went with int.

After recent discussion with PyTorch https://github.com/pytorch/pytorch/issues/162845#issuecomment-3609127149 there is a good feedback that having PyCapsule with name "dlpack_exchange_api" would provide better checking guarantees.

Given the original discussion everyone seems to agree both can work, I think it is good to take the additional feedback and update to bring PyCapsule support here. This RFC proposes to update the convention to allow the __c_dlpack_exchange_api__ field to take PyCapsule (with name "dlpack_exchange_api").

tqchen avatar Dec 03 '25 23:12 tqchen

cc @seberg @rgommers @leofang @dalcinl @albanD @ngimel

tqchen avatar Dec 03 '25 23:12 tqchen

I'm a bit confused. What exactly __c_dlpack_exchange_api__ is supposed to be? I would argue is has to be a callable, such that capsule = exporter.__c_dlpack_exchange_api__(), like the current __dlpack[_device]__ dunders.

dalcinl avatar Dec 04 '25 11:12 dalcinl

It was discussed as a class attribute and not a method, since there should never be a reason to pass any arguments. That said now that you mention it, a class attribute is slightly more annoying to implement as it has to be put into the tp_dict in the C-API (overhead wise that might still be better since at least you can do a getattr(x, "attr", NotImplemented) to avoid an explicit hasattr check or errors (I can't think of a "call if exists" style API).

I don't see much of a downside to using a capsule, it seems largely a matter of taste. It does add a bit of safety, although it also adds a tiny bit of overhead if you use a capsule name, which is nice for safety. (Of course all of these overheads are like 10-20ns at most.)

seberg avatar Dec 04 '25 11:12 seberg

It was discussed as a class attribute. This being said, in the PyTorch discussion the callable choice was indeed bought up by @junrushao . Explicitly listing out choices:

  • A0a. Take __c_dlpack_exchange_api__ as a PyCapsule in class attribute
    • A0b: original __c_dlpack_exchange_api__ as class attribute and int (I think we should move away from this one, given it is in latest and not a lot of implementation, it is likely OK)
  • A1. Update the __dlpack_c_exchange_api__ to be a method which returns a PyCapsule.
    • A1a: Use __c_dlpack_exchange_api__ as method name
    • A1b: Use __dlpack_c_exchange_api__ as method name

Would be great if people can chime in about preference,

Note on consumer perf: performance-wise the choices won't be too big of an issue, consumer can use a TLS(in case of future threaded-python) cache dict dlpack_exchange_api_dict : dict[Type, int] and do lazy population if needed, which likely gets us the same efficiency as raw int attribute.

tqchen avatar Dec 04 '25 12:12 tqchen

Speaking out my thoughts, after seeing latest discussions, I do not have a strong preference of A0 vs A1, i think they all can work, if we go with A1 perhaps A1b __dlpack_c_exchange_api__ is a slightly better name since the prefix aligns with dlpack dunder methods.

tqchen avatar Dec 04 '25 12:12 tqchen

It was discussed as a class attribute.

Then checking for the attribute, with let say hasattr(), always trigger all the machinery creating the capsule. That's what I do not like about it being an attribute.

dalcinl avatar Dec 04 '25 14:12 dalcinl

True, but since it is a static object (and has to be as per standard), you would be able to cache it anyway.

seberg avatar Dec 04 '25 14:12 seberg

but since it is a static object

Oh, of course, never mind.

dalcinl avatar Dec 04 '25 15:12 dalcinl

thanks @seberg @dalcinl, do u mind reply with your updated thoughts on A0/A1a/A1b ?

tqchen avatar Dec 04 '25 22:12 tqchen

performance-wise the choices won't be too big of an issue, consumer can use a TLS

Is that true? Classes can override the method per-instance. So we cannot actually cache per-type if it is a method. It would need to be a classmethod (or staticmethod) that takes a particular instance as input to be able to make this cache work.

The difference in perf from what I can see between A0 and A1 (staticmethod) when considering a user that is calling this from C is really: do a python call + wrap the DLPack Tensor into a PyCapsule + have the C code get the pointer from the PyCapsule. Some very approximate timing via timeit seems to put that overhead at ~80ns for my local cpython build.

While I do agree that A1 is a nicer way to expose this as a public API on a python object. I can understand that the extra overhead for C caller would defeat the purpose of this special fast exchange API, so A0 sounds fair as well.

albanD avatar Dec 04 '25 22:12 albanD

Classes can override the method per-instance.

@albanD this is a great point, that would indeed make A0 more appealing

tqchen avatar Dec 04 '25 22:12 tqchen

@tqchen As ultimate performance is the whole point of the new thing, everything seems to indicate that A0 is a better choice indeed.

Additionally, I would like to second your proposal of using __dlpack_c_exchange_api__ for the name of the attribute (https://github.com/dmlc/dlpack/issues/179#issuecomment-3612022617) to align better with the current dlpack dunder naming.

dalcinl avatar Dec 05 '25 09:12 dalcinl

Is that true? Classes can override the method per-instance. So we cannot actually cache per-type if it is a method.

We specified that it must be cacheable and the underlying struct must be static (so you don't eve have to hold on to the capsule, which was obvious when it was an integer :)).

We even said that the correct use is to look it up on the type: type(obj).__dlpack_c_exchange_api__, so instances are not involved and the type can be assumed immutable.

Because of that, you don't even need a TLS (if you care about subinterpreters, I guess you may need a per-interpreter cache), the cache can be global (except for race-conditions modifying it).


Now class vs. instance, I much prefer class (to prevent misguided forwarding). One note is that on older Python versions class attributes are much slower on failure (and failure is important since hasattr(obj, "...") is likely to be false). This seems solved, so I am not too worried about it. But, I did not check which version it was fixed (you can try with getattr(int, "invalid", None) vs. getattr(0, "invalid", None)).

  • An attribute will save the vectorcall and thus a few nanoseconds at least.
  • I recently noticed that adding class attributes for C types is slightly annoying. The reason is that while class methods are supported by tp_methods there is no C-level mechanism for attributes. This is not a big problem, you can put it into the types tp_dict (typically, we don't do anything for the tp_dict, but I don't).

While I find that tp_dict dance rather annoying, I don't mind it much we want to save those nanoseconds. To be clear: My understanding is that you must use the tp_dict dance if you want type(obj).__protocol__ to return the capsule, which is desirable to enforce the static nature of it.


Unless we are re-considering the class lookup requirement itself? I like staying in C. The one thing that we cannot do with a class lookup and a capsule is to define wrapper classes such as:

class MyClass:
    def __init__(self, arr):
        self.data = arr

    @property  # or say class attribute...
    def __dlpack_c_exchange_api__:
        # how to forward to `self.data`?

For that to work, we would actually have to extend this protocol to allow an instance attribute in some form (which then must return a an object that defines the C-level protocol/class attribute!).

(To be clear, I am happy to not worry about this now, one could implement that on the consumer side by checking for type(obj).__dlpack_c_exchange_api__ being the capsule and if not do an instance lookup and repeat the type lookup on returned object.)

seberg avatar Dec 05 '25 10:12 seberg

thanks all for the remarks, seems we are converging towards the following A0 proposal:

  • Keep the dunder as class static attribute, stored as PyCapsule type(obj).__dlpack_c_exchange_api__ only
  • Naming of the dunder: use __dlpack_c_exchange_api__ to align with existing dlpack dunders.

I think the naming part is minor but also think we can go with this one, so all dlpack dunders starts with prefix dlpack. Would be great to get a final confirmation, then we can proceed with changes.

cc @seberg @rgommers @leofang @dalcinl @albanD @ngimel

tqchen avatar Dec 05 '25 12:12 tqchen

PR is up in https://github.com/dmlc/dlpack/pull/180

tqchen avatar Dec 05 '25 22:12 tqchen

Unfortunately I have been unable to follow the recent, rapid advancement in DLPack, and do not fully understand what we are trying to accomplish with the added complexity. I worry that we are rushing into an uncharted territory without enough buy-in from the wider community. Would it be possible to discuss this e.g. in the next array API meeting, and give other stakeholders time to chime in?

leofang avatar Dec 06 '25 02:12 leofang

@leofang of course, happy to do that in incoming array api meeting as we did last time

tqchen avatar Dec 06 '25 12:12 tqchen

Oh it was not my intent to propose dramatic change to dlpack without sufficient buy-in from community! On behalf of tvm-ffi, I’d love to see and facilitate dlpack adoption in frameworks such as PyTorch, and that was why I was weighing in and trying to figure out what works the best. I’m happy to follow what the committee believe make sense.

junrushao avatar Dec 06 '25 17:12 junrushao

@junrushao I think this moved quickly with (to me!) unclear (experimental) implementations or experience with it. And I think this request to change something quite central shows that a bit (I may well be missing an overview from the tvm-ffi side or other stake-holders!). From a NumPy PoV, I would probably have to consider DLPack 1.2 to be still at an early stage without clarifying this a bit more (i.e. wait at least for a larger implementation and see what they actually do exactly). I.e. I don't think there is any dramatic change proposed here at all.

seberg avatar Dec 08 '25 10:12 seberg

cc @hawkinsp

tqchen avatar Dec 11 '25 18:12 tqchen