[RFC] DLPack C Exchange API Attribute to PyCapsule
Background in https://github.com/dmlc/dlpack/issues/175 we introduced the C exchange API. In the original discussion we have considered proposals of PyCapsule as field and in dunder. We discussed the choice array-api meeting session where and I think everyone agreed both path can work, but then eventually went with int.
After recent discussion with PyTorch https://github.com/pytorch/pytorch/issues/162845#issuecomment-3609127149 there is a good feedback that having PyCapsule with name "dlpack_exchange_api" would provide better checking guarantees.
Given the original discussion everyone seems to agree both can work, I think it is good to take the additional feedback and update to bring PyCapsule support here. This RFC proposes to update the convention to allow the __c_dlpack_exchange_api__ field to take PyCapsule (with name "dlpack_exchange_api").
cc @seberg @rgommers @leofang @dalcinl @albanD @ngimel
I'm a bit confused. What exactly __c_dlpack_exchange_api__ is supposed to be? I would argue is has to be a callable, such that capsule = exporter.__c_dlpack_exchange_api__(), like the current __dlpack[_device]__ dunders.
It was discussed as a class attribute and not a method, since there should never be a reason to pass any arguments.
That said now that you mention it, a class attribute is slightly more annoying to implement as it has to be put into the tp_dict in the C-API (overhead wise that might still be better since at least you can do a getattr(x, "attr", NotImplemented) to avoid an explicit hasattr check or errors (I can't think of a "call if exists" style API).
I don't see much of a downside to using a capsule, it seems largely a matter of taste. It does add a bit of safety, although it also adds a tiny bit of overhead if you use a capsule name, which is nice for safety. (Of course all of these overheads are like 10-20ns at most.)
It was discussed as a class attribute. This being said, in the PyTorch discussion the callable choice was indeed bought up by @junrushao . Explicitly listing out choices:
- A0a. Take
__c_dlpack_exchange_api__as a PyCapsule in class attribute- A0b: original
__c_dlpack_exchange_api__as class attribute and int (I think we should move away from this one, given it is in latest and not a lot of implementation, it is likely OK)
- A0b: original
- A1. Update the
__dlpack_c_exchange_api__to be a method which returns a PyCapsule.- A1a: Use
__c_dlpack_exchange_api__as method name - A1b: Use
__dlpack_c_exchange_api__as method name
- A1a: Use
Would be great if people can chime in about preference,
Note on consumer perf: performance-wise the choices won't be too big of an issue, consumer can use a TLS(in case of future threaded-python) cache dict dlpack_exchange_api_dict : dict[Type, int] and do lazy population if needed, which likely gets us the same efficiency as raw int attribute.
Speaking out my thoughts, after seeing latest discussions, I do not have a strong preference of A0 vs A1, i think they all can work, if we go with A1 perhaps A1b __dlpack_c_exchange_api__ is a slightly better name since the prefix aligns with dlpack dunder methods.
It was discussed as a class attribute.
Then checking for the attribute, with let say hasattr(), always trigger all the machinery creating the capsule. That's what I do not like about it being an attribute.
True, but since it is a static object (and has to be as per standard), you would be able to cache it anyway.
but since it is a static object
Oh, of course, never mind.
thanks @seberg @dalcinl, do u mind reply with your updated thoughts on A0/A1a/A1b ?
performance-wise the choices won't be too big of an issue, consumer can use a TLS
Is that true? Classes can override the method per-instance. So we cannot actually cache per-type if it is a method. It would need to be a classmethod (or staticmethod) that takes a particular instance as input to be able to make this cache work.
The difference in perf from what I can see between A0 and A1 (staticmethod) when considering a user that is calling this from C is really: do a python call + wrap the DLPack Tensor into a PyCapsule + have the C code get the pointer from the PyCapsule. Some very approximate timing via timeit seems to put that overhead at ~80ns for my local cpython build.
While I do agree that A1 is a nicer way to expose this as a public API on a python object. I can understand that the extra overhead for C caller would defeat the purpose of this special fast exchange API, so A0 sounds fair as well.
Classes can override the method per-instance.
@albanD this is a great point, that would indeed make A0 more appealing
@tqchen As ultimate performance is the whole point of the new thing, everything seems to indicate that A0 is a better choice indeed.
Additionally, I would like to second your proposal of using __dlpack_c_exchange_api__ for the name of the attribute (https://github.com/dmlc/dlpack/issues/179#issuecomment-3612022617) to align better with the current dlpack dunder naming.
Is that true? Classes can override the method per-instance. So we cannot actually cache per-type if it is a method.
We specified that it must be cacheable and the underlying struct must be static (so you don't eve have to hold on to the capsule, which was obvious when it was an integer :)).
We even said that the correct use is to look it up on the type: type(obj).__dlpack_c_exchange_api__, so instances are not involved and the type can be assumed immutable.
Because of that, you don't even need a TLS (if you care about subinterpreters, I guess you may need a per-interpreter cache), the cache can be global (except for race-conditions modifying it).
Now class vs. instance, I much prefer class (to prevent misguided forwarding). One note is that on older Python versions class attributes are much slower on failure (and failure is important since hasattr(obj, "...") is likely to be false).
This seems solved, so I am not too worried about it. But, I did not check which version it was fixed (you can try with getattr(int, "invalid", None) vs. getattr(0, "invalid", None)).
- An attribute will save the vectorcall and thus a few nanoseconds at least.
- I recently noticed that adding class attributes for C types is slightly annoying. The reason is that while class methods are supported by
tp_methodsthere is no C-level mechanism for attributes. This is not a big problem, you can put it into the typestp_dict(typically, we don't do anything for thetp_dict, but I don't).
While I find that tp_dict dance rather annoying, I don't mind it much we want to save those nanoseconds.
To be clear: My understanding is that you must use the tp_dict dance if you want type(obj).__protocol__ to return the capsule, which is desirable to enforce the static nature of it.
Unless we are re-considering the class lookup requirement itself? I like staying in C. The one thing that we cannot do with a class lookup and a capsule is to define wrapper classes such as:
class MyClass:
def __init__(self, arr):
self.data = arr
@property # or say class attribute...
def __dlpack_c_exchange_api__:
# how to forward to `self.data`?
For that to work, we would actually have to extend this protocol to allow an instance attribute in some form (which then must return a an object that defines the C-level protocol/class attribute!).
(To be clear, I am happy to not worry about this now, one could implement that on the consumer side by checking for type(obj).__dlpack_c_exchange_api__ being the capsule and if not do an instance lookup and repeat the type lookup on returned object.)
thanks all for the remarks, seems we are converging towards the following A0 proposal:
- Keep the dunder as class static attribute, stored as PyCapsule
type(obj).__dlpack_c_exchange_api__only - Naming of the dunder: use
__dlpack_c_exchange_api__to align with existing dlpack dunders.
I think the naming part is minor but also think we can go with this one, so all dlpack dunders starts with prefix dlpack. Would be great to get a final confirmation, then we can proceed with changes.
cc @seberg @rgommers @leofang @dalcinl @albanD @ngimel
PR is up in https://github.com/dmlc/dlpack/pull/180
Unfortunately I have been unable to follow the recent, rapid advancement in DLPack, and do not fully understand what we are trying to accomplish with the added complexity. I worry that we are rushing into an uncharted territory without enough buy-in from the wider community. Would it be possible to discuss this e.g. in the next array API meeting, and give other stakeholders time to chime in?
@leofang of course, happy to do that in incoming array api meeting as we did last time
Oh it was not my intent to propose dramatic change to dlpack without sufficient buy-in from community! On behalf of tvm-ffi, I’d love to see and facilitate dlpack adoption in frameworks such as PyTorch, and that was why I was weighing in and trying to figure out what works the best. I’m happy to follow what the committee believe make sense.
@junrushao I think this moved quickly with (to me!) unclear (experimental) implementations or experience with it. And I think this request to change something quite central shows that a bit (I may well be missing an overview from the tvm-ffi side or other stake-holders!). From a NumPy PoV, I would probably have to consider DLPack 1.2 to be still at an early stage without clarifying this a bit more (i.e. wait at least for a larger implementation and see what they actually do exactly). I.e. I don't think there is any dramatic change proposed here at all.
cc @hawkinsp