cython icon indicating copy to clipboard operation
cython copied to clipboard

[ENH] Restrict instance methods on extension types to the actual type

Open pitrou opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe.

Currently, it's possible to call instance methods of extension types with any self argument that's not of the desired type. For many extension types, this will result in a crash when accessing the C-level internals.

Example in PyArrow:

>>> import pyarrow as pa
>>> pa.Array.get_total_buffer_size(0)
Erreur de segmentation

Describe the solution you'd like.

Cython should, by default or when enabled through an option, guard against any call of extension type instance methods with a self that's not an instance of the type.

Describe alternatives you've considered.

  1. Manual type check in each instance method: works but extremely tedious.
  2. Cython type annotation in the method signature: this doesn't actually seem result in a runtime type check?
diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index 59d2e91ef6..a2aa3288d4 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -1226,7 +1226,7 @@ cdef class Array(_PandasConvertible):
             size = GetResultValue(c_size_res)
         return size
 
-    def get_total_buffer_size(self):
+    def get_total_buffer_size(Array self):
         """
         The sum of bytes in each buffer referenced by the array.
 

Additional context

No response

pitrou avatar Apr 04 '24 15:04 pitrou

Note that "regular" C extension classes do not exhibit this problem as their method implementations are automatically guarded by a type check:

>>> import numpy as np
>>> np.ndarray.byteswap(0)
Traceback (most recent call last):
  Cell In[5], line 1
    np.ndarray.byteswap(0)
TypeError: descriptor 'byteswap' for 'numpy.ndarray' objects doesn't apply to a 'int' object

pitrou avatar Apr 04 '24 15:04 pitrou

Note that "regular" C extension classes do not exhibit this problem as their method implementations are automatically guarded by a type check [...]

This is probably why we don't check.

This issue only applies to cyfunctions (i.e. with the cython.binding directive set to True, which is now the default). Before that existing all Cython methods were "regular" C extension methods so had the check automatically built in. Which is presumably why we never added it in Cython.

I suspect it would be worth adding the check.

da-woods avatar Apr 04 '24 20:04 da-woods