scipy
scipy copied to clipboard
Audit usage of cython memoryviews, add `const` to allow readonly arrays
Starting from Cython >=3.0.8, readonly fused typed memoryviews are finally allowed. Meaning we can add const
qualifiers to memoryviews so that readonly arrays do not get rejected. We should audit our usage of memoryviews with fused dtypes and sprinkle const
pretty much everywhere.
https://github.com/scipy/scipy/pull/20195 gives a representative example.
gh-20085 contains what's needed in cluster
for the test suite to pass (having JAX support is a handy way to check read-only arrays)
@ev-br I can help out with this, but I'm not quite sure where to start. Any recommended places to look at?
That would be great @Kai-Striega !
I would just grep pyx files by subpackage, along the lines of $ grep def scipy/optimize -rn
to list all cython function declarations. For each of those, if it has memoryviews, add const
to all which are not outputs.
IOW, def func(const double[::1] x, double[::1] out)
means that x
can be a regular or a readonly array, and out
cannot be readonly.
Great. I'll try to audit what needs to be done tonight/this weekend
EDIT: I did a quick run on this, here is the invocation and output:
~/Projects/scipy on main at 13:02:25
❯ grep def --include ./scipy\*.pyx -rnc
./scipy/cluster/_vq.pyx:32
./scipy/cluster/_optimal_leaf_ordering.pyx:15
./scipy/cluster/_hierarchy.pyx:117
./scipy/ndimage/src/_cytest.pyx:28
./scipy/ndimage/src/_ni_label.pyx:26
./scipy/linalg/_decomp_lu_cython.pyx:9
./scipy/linalg/_matfuncs_sqrtm_triu.pyx:4
./scipy/linalg/_cythonized_array_utils.pyx:40
./scipy/linalg/_solve_toeplitz.pyx:9
./scipy/optimize/_trlib/_trlib.pyx:38
./scipy/optimize/_group_columns.pyx:11
./scipy/optimize/_highs/cython/src/_highs_constants.pyx:0
./scipy/optimize/_highs/cython/src/_highs_wrapper.pyx:35
./scipy/optimize/_lsq/givens_elimination.pyx:9
./scipy/optimize/_bglu_dense.pyx:48
./scipy/optimize/tnc/_moduleTNC.pyx:14
./scipy/io/matlab/_streams.pyx:45
./scipy/io/matlab/_mio_utils.pyx:8
./scipy/io/matlab/_mio5_utils.pyx:109
./scipy/_lib/_ccallback_c.pyx:16
./scipy/_lib/_test_deprecation_def.pyx:3
./scipy/_lib/_test_deprecation_call.pyx:2
./scipy/_lib/messagestream.pyx:11
./scipy/special/_specfun.pyx:117
./scipy/special/_cdflib.pyx:494
./scipy/special/_ellip_harm_2.pyx:39
./scipy/special/_test_internal.pyx:18
./scipy/special/_comb.pyx:4
./scipy/fftpack/convolve.pyx:8
./scipy/interpolate/interpnd.pyx:75
./scipy/interpolate/_bspl.pyx:30
./scipy/interpolate/_ppoly.pyx:72
./scipy/interpolate/_rgi_cython.pyx:4
./scipy/sparse/csgraph/_shortest_path.pyx:63
./scipy/sparse/csgraph/_traversal.pyx:46
./scipy/sparse/csgraph/_flow.pyx:47
./scipy/sparse/csgraph/_tools.pyx:21
./scipy/sparse/csgraph/_matching.pyx:36
./scipy/sparse/csgraph/_reordering.pyx:15
./scipy/sparse/csgraph/_min_spanning_tree.pyx:4
./scipy/spatial/_qhull.pyx:241
./scipy/spatial/_ckdtree.pyx:89
./scipy/spatial/_voronoi.pyx:7
./scipy/spatial/_hausdorff.pyx:8
./scipy/spatial/transform/_rotation.pyx:166
./scipy/signal/_max_len_seq_inner.pyx:5
./scipy/signal/_peak_finding_utils.pyx:9
./scipy/signal/_upfirdn_apply.pyx:58
./scipy/signal/_spectral.pyx:5
./scipy/signal/_sosfilt.pyx:17
./scipy/stats/_levy_stable/levyst.pyx:11
./scipy/stats/_ansari_swilk_statistics.pyx:23
./scipy/stats/_rcont/rcont.pyx:11
./scipy/stats/_stats.pyx:101
./scipy/stats/_qmc_cy.pyx:50
./scipy/stats/_biasedurn.pyx:38
./scipy/stats/_unuran/unuran_wrapper.pyx:201
./scipy/stats/_sobol.pyx:47
Note that this significantly overestimates the work to be done, as we also include cdef
statements inside the function
@ev-br I've got a question for you: I tried to change the argument a
on this line line to be const-qualified. However I get the following warnings:
scipy/linalg/_cythonized_array_utils.cpython-311-darwin.so.p/_cythonized_array_utils.c:21402:79: warning: passing 'const float *' to parameter of type '__pyx_t_5scipy_6linalg_13cython_lapack_s *' (aka 'float *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
__pyx_f_5scipy_6linalg_13cython_lapack_sgetrf((&__pyx_v_n), (&__pyx_v_n), (&(*((float const *) ( /* dim=1 */ ((char *) (((float const *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_3 * __pyx_v_a.strides[0]) )) + __pyx_t_4)) )))), (&__pyx_v_n), (&(__pyx_v_piv[0])), (&__pyx_v_info));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scipy/linalg/_cythonized_array_utils.cpython-311-darwin.so.p/_cythonized_array_utils.c:21855:79: warning: passing 'const double *' to parameter of type '__pyx_t_5scipy_6linalg_13cython_lapack_d *' (aka 'double *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
__pyx_f_5scipy_6linalg_13cython_lapack_dgetrf((&__pyx_v_n), (&__pyx_v_n), (&(*((double const *) ( /* dim=1 */ ((char *) (((double const *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_3 * __pyx_v_a.strides[0]) )) + __pyx_t_4)) )))), (&__pyx_v_n), (&(__pyx_v_piv[0])), (&__pyx_v_info));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scipy/linalg/_cythonized_array_utils.cpython-311-darwin.so.p/_cythonized_array_utils.c:22310:79: warning: passing 'const __pyx_t_float_complex *' to parameter of type '__pyx_t_float_complex *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
__pyx_f_5scipy_6linalg_13cython_lapack_cgetrf((&__pyx_v_n), (&__pyx_v_n), (&(*((__pyx_t_float_complex const *) ( /* dim=1 */ ((char *) (((__pyx_t_float_complex const *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_3 * __pyx_v_a.strides[0]) )) + __pyx_t_4)) )))), (&__pyx_v_n), (&(__pyx_v_piv[0])), (&__pyx_v_info));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scipy/linalg/_cythonized_array_utils.cpython-311-darwin.so.p/_cythonized_array_utils.c:22769:79: warning: passing 'const __pyx_t_double_complex *' to parameter of type '__pyx_t_double_complex *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
__pyx_f_5scipy_6linalg_13cython_lapack_zgetrf((&__pyx_v_n), (&__pyx_v_n), (&(*((__pyx_t_double_complex const *) ( /* dim=1 */ ((char *) (((__pyx_t_double_complex const *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_3 * __pyx_v_a.strides[0]) )) + __pyx_t_4)) )))), (&__pyx_v_n), (&(__pyx_v_piv[0])), (&__pyx_v_info));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 warnings generated.
I understand what the warnings mean. My question is whether we should also const
qualify the relevant functions in the call to lapack?
Ouch. I don't know whether
- cython does sensible things in
if lapack_t is float: ...
selections below https://github.com/scipy/scipy/blob/9c98ee762a95cf46d8f91f1c1a5547701538d286/scipy/linalg/_cythonized_array_utils.pyx#L29 - lapack bindings are prepared to handle
const float*
arguments. there is no notion of const in F77.
Maybe @ilayn has an opinion here?
Probably you will need the const
version of that fused type if you want to pass const lapack_t
. This whole templating business we have is just a neverending chore in my opinion (say, if compared to Rust generics)
So I am tempted to rewrite them spelled out four times for these helper functions rather than dealing with these fake-C constructs of Cython.
Actually since @rgommers triggered me to write more C code instead of Cython in the scipy.special
rewrites, I think it is better for us to have more C code than Cython for array manipulation funcs. We really don't benefit from Cython which is best for mostly quick bursts of typed arithmetic and for loops but instead we are extensively using arrays and low-level C access. That's both not useful for us and unfair to Cython.
I still don't have a full grasp of NumPy C API but I think I got the gist of it lately. So feel free to modify things as you see fit. I will circle back to these at some point anyways.
Public Cython API we expose (cython_blas
, cython_lapack
, cython_special
) is the exception to what is actionable in this issue. We tried adding const
to signatures before, and found it was a backwards compatibility break. Cython considers any change in signature breaking, even if adding const
in the equivalent C code would be perfectly compatible. So I suggest not touching public Cython API.
We really don't benefit from Cython which is best for mostly quick bursts of typed arithmetic and for loops but instead we are extensively using arrays and low-level C access. That's both not useful for us and unfair to Cython
While this starts to border OT for this issue, I'm going to go on record that I disagree with the statement above. We definitely do benefit from both Cython and f2py, with all their shortcomings. The value we get from boundschecking, from abstracting away pointer arithmetics and refcounting is huge. Note that scipy.special is indeed special: in there you basically have scalar kernels + ugly magic of generating the ufuncs. I cannot believe anybody would seriously suggest switching back to generating the python glue manually.
So realistically, there are IMO two things to sort out if we want to gradually have more C++ than Cython:
- what's the best way to generate the python glue. FWIW, thin cython wrappers are easy to write, easy to read, not error-prone, and well understood. What are the shortcomings?
- what's the best way to handle arrays in C++. Not via pointer arithmerics. No, definitely not via NpyIter :-).
If we don't sort these out, we're going to end up in a mess which is way worse than the status quo. Do I have my answers? Sort of. In-progress. In https://github.com/scipy/scipy/pull/19753.
The value we get from boundschecking, from abstracting away pointer arithmetics and refcounting is huge.
Ah, I don't mean to use less Cython. I want to use it where it matters. Let me clarify. I mean there is no significant speed benefit to do array operations in Cython. NumPy is already pretty performant, for many array operations. In fact often, you don't even need to type the numpy arrays. They are already fast enough.
However if you want to do certain things involving for loops and other ops there is crazy amount of performance that Cython offers. What I meant is that if we really want to get Cython benefits we probably should use it where it matters and not just for minute performance improvements. The function @Kai-Striega mentioned is a good example in my opinion. I wrote it when I did not have any other choice. However Cython's fused type handling is really not so robust and takes quite a bit of Python machinery around it to provide the right argument at the right contiguousness etc. It is getting better but quite not there yet in my opinion.
I think we are on the same page with your comments anyways. Regarding the two bullet points you mentioned,
- Cython is injecting too much code in between and binary size increase. They are really great for C-function binding just do
cimport ... extern ...
and we are good to go. So in that sense we should use more of it. Definitely much easier thanpybind11
ornanobind
. - If we have to, I would side with using C and not C++. Then it is pretty mundane old-school flat memory with
A[ row*n + col ] = 5.
silliness with all unsafe stuff about C but at least very robust infrastructure/compiler support and backwards compatible language not changing incredibly every 3-5 years.
I mean there is no significant speed benefit to do array operations in Cython. NumPy is already pretty performant, for many array operations.
Yes, until you need a for loop. We all know it, and you write exactly this below. So no preaching to the choir. My point is that needing a for loop is not a fringe rarity, and it should not be made prohibitively difficult.
Now, for cython producing too fat binaries, that's actually actionable (or at least researcheable). Cool!
- what adds the most fat, what can an author of a cython module do to slim it down?
- cython naturally encourages to mix pure python and typed C-like constructs. A pure guesswork is that it's this pure python which adds to the module size, is this correct?
Flat memory: yes, as long as I can switch the boundschecking on :-).
For C/C++ : oh beware. THere will be six opinions for every five devs :-). All the more important to formulate guidelines, on a common denominator, however informal.
We should probably take this discussion to a separate issue anyway.
Bringing the discussion back to this issue: I think I'm going to leave these functions till last, or at least until we have reached a consensus regarding our use of Cython/C/C++.
what adds the most fat, what can an author of a cython module do to slim it down?
See:
- https://cython.readthedocs.io/en/latest/src/userguide/faq.html#how-do-i-reduce-the-size-of-the-binary-modules
- https://github.com/cython/cython/issues/6045
- https://github.com/cython/cython/issues/4425#issuecomment-1966330797
- https://github.com/cython/cython/issues/6051
This remains a thing that causes issues, so it'd be good to get it done. I edited the issue description to add a tracker per submodule. If you do one, please ensure to do a whole submodule and then tick it off in the tracker.
Cc @czgdp1807
Sure. I will work on it. I think the issue will be completed when all the tasks here are done.
I opened 3 PRs today. #20872, #20873, #20874. Let me know if I should work on modifying tests for these const
additions to make sure someone doesn't undo the const
additions in future. Other than this I can't think of more benefits of adding/modifying tests. Let me know if I am missing out something.
Nice! I don't think we need tests for that. Removing const
from anywhere is highly unlikely, and the kind of thing that should be caught from code review. I think we either test read-only arrays structurally across all APIs (which would be a lot of work), or we don't test it unless we have a good reason to (like a bug report).
we either test read-only arrays structurally across all APIs
well, good job we are adding array API support and testing with read-only JAX arrays, then 😉
_lib
, fftpack
, integrate
, ndimage
, special
, io
don't have any array usage in pyx
files. So this issue is complete from my end.
io
too?
Yes. I forgot to mention it. io
too doesn't have any array usage in pyx
files.
great, thanks @czgdp1807 ! If we've missed any, JAX should tell us as array API standard support is expanded.
Awesome, thanks for completing this. And yes, JAX will be a nice way to verify without having to write more tests:)