pandas icon indicating copy to clipboard operation
pandas copied to clipboard

Cython 3.0 Checklist

Open jbrockmendel opened this issue 4 years ago • 14 comments

Collecting TODOs for once we bump to cython3:

  • [x] use conditional nogil e.g. here
  • [x] change labels.shape[0] -> len(labels) e.g. groupbyL965
  • [ ] update searchsorted usages to use fixed cython version
  • [ ] Move Timestamp methods to _Timestamp, assuming https://github.com/cython/cython/pull/3605 is merged
  • [x] in _libs.internals use from cpython.pyport cimport PY_SSIZE_T_MAX instead of cdef extern ...
  • [x] in _libs.util use from numpy cimport PyArray_CLEARFLAGS, NPY_ARRAY_C_CONTIGUOUS, NPY_ARRAY_F_CONTIGUOUS instead of cdef extern ...
  • [x] in _libs.parsers use from cpython.unicode cimport PyUnicode_FromString instead of cdef extern ...
  • [ ] in tslibs.parsing replace PyObject_Str with str (xref https://github.com/cython/cython/pull/3478)
  • [ ] from numpy cimport NPY_DATETIMEUNIT
  • [ ] Audit usages of arg : datetime annotations in tslibs. Can interfere with non-nano usages of Timestamp (xref https://github.com/pandas-dev/pandas/pull/54169)

@WillAyd am i missing anything (const object[:]?)? If so, go ahead and edit the OP.

jbrockmendel avatar May 16 '20 18:05 jbrockmendel

@jbrockmendelor @WillAyd Could any of you post an update of the current the status of the cython 3.0 support?

I'd like to work on this but would like to avoid running into duplicated work.

Also, which language level of cython are we currently on (2 or 3, or 3str)?

rainwoodman avatar May 11 '21 21:05 rainwoodman

Could any of you post an update of the current the status of the cython 3.0 support?

nothing really to do at the moment; we'll revisit once cython 3.0 is released

Also, which language level of cython are we currently on (2 or 3, or 3str)?

exclusively 3

jbrockmendel avatar May 11 '21 21:05 jbrockmendel

Thanks for the heads up. So there will be no replication of work.

Just discovered an additional problem to the current list: in reduction.pyx, we are mutating the .data member of ndarray. This operation is no longer supported in cython 3.0 and also deprecated in numpy. Is there already a plan / work in progress for eliminating the mutation of ndarray.data?

rainwoodman avatar May 11 '21 21:05 rainwoodman

Is there already a plan / work in progress for eliminating the mutation of ndarray.data?

Yah that one is pretty ugly. Three approaches have been discussed:

  1. find a supported numpy API to continue doing this mutation
    • I haven't looked for this and am not optimistic it exists
  2. instead of mutating, just create new ndarrays
    • this will be less performant than the mutation, but not that bad since it is just slicing, and way less kludgy
    • but last time I tried this i got segfaults
  3. rip out the relevant pieces of reduction.pyx and use the non-cython paths
    • we've recently significantly optimized the non-cython paths, so this isn't that bad; xref #40263

If you have another idea, or can make options 1 or 2 work, that'd be great.

jbrockmendel avatar May 11 '21 22:05 jbrockmendel

Thanks. I have a PR along the lines of 1: The approach is more like find a supported Cython way to use the deprecated numpy API till we decide on 2 or 3.

After this PR there is a segment fault from an apparently infinite recursive looking up of a typeslot inside the tslib. I feel this is more likely a Cython bug than a pandas bug.

@jbrockmendel: any pointers as of where to look at on the pandas side?

@da-woods, @scoder: any pointers on the cython side?

Here is a piece of the stacktrace:

#13 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#14 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850, 
    right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88043
#15 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#16 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850, 
    right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88043
#17 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#18 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850, 
    right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88043
#19 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#20 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850, 
    right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88043
#21 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#22 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850, 
                                                                                
static CYTHON_INLINE PyObject *__pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot(PyTypeObject* type, PyObject *left, PyObject *right ) {
    binaryfunc slot;                                                            
#if CYTHON_USE_TYPE_SLOTS || PY_MAJOR_VERSION < 3 || CYTHON_COMPILING_IN_PYPY   
    slot = type->tp_as_number ? type->tp_as_number->nb_add : NULL;              
#else                                                                           
    slot = (binaryfunc) PyType_GetSlot(type, Py_nb_add);                        
#endif                                                                          
    return slot ? slot(left, right ) : __Pyx_NewRef(Py_NotImplemented);         
} 

rainwoodman avatar May 17 '21 22:05 rainwoodman

@rainwoodman

Special methods for binary operators now follow Python semantics. Rather than e.g. a single __add__ method for cdef classes, where "self" can be either the first or second argument, one can now define both __add__ and __radd__ as for standard Python classes. This behavior can be disabled with the c_api_binop_methods directive to return to the previous semantics in Cython code (available from Cython 0.29.20), or the reversed method (__radd__) can be implemented in addition to an existing two-sided operator method (__add__) to get a backwards compatible implementation. (Github issue :issue:2056)

I don't quite see why it's looking forever now (it's possibly a bug...) but I think that's the relevant change and hopefully c_api_binop_methods should disable it.

Not hugely confident of this diagnosis though...

da-woods avatar May 18 '21 06:05 da-woods

Special methods for binary operators now follow Python semantics

Neat! We'll be able to get rid of a few non-pythonic code paths in Timestamp, Timedelta, Period, NaT.

jbrockmendel avatar May 18 '21 17:05 jbrockmendel

Adding c_api_binop_methods=True fixed all segfaults.

rainwoodman avatar May 18 '21 17:05 rainwoodman

I believe it's a bug https://github.com/cython/cython/issues/4172. It sounds like you can work around it for now though

da-woods avatar May 18 '21 18:05 da-woods

@lithomas1 can you check off the relevant boxes in the OP and possibly close?

jbrockmendel avatar Aug 11 '23 13:08 jbrockmendel

@rhshadrach @lithomas1 is this closable?

jbrockmendel avatar Nov 30 '23 15:11 jbrockmendel

Haven't been up to date on this issue, did we put back #54482 changes?

lithomas1 avatar Dec 04 '23 21:12 lithomas1

pyproject.toml has Cython~=3.0.5, and has since v2.2.0, so seems like this can be closed?

QuLogic avatar Feb 13 '24 10:02 QuLogic

@QuLogic - there are unchecked items in the OP here. At a glance, it isn't clear to me what has been done and what remains. If we can confirm all tasks have been done, then yes, this can be closed.

rhshadrach avatar Feb 13 '24 22:02 rhshadrach