scipy icon indicating copy to clipboard operation
scipy copied to clipboard

Audit usage of cython memoryviews, add `const` to allow readonly arrays

Open ev-br opened this issue 11 months ago • 15 comments

Starting from Cython >=3.0.8, readonly fused typed memoryviews are finally allowed. Meaning we can add const qualifiers to memoryviews so that readonly arrays do not get rejected. We should audit our usage of memoryviews with fused dtypes and sprinkle const pretty much everywhere. https://github.com/scipy/scipy/pull/20195 gives a representative example.

ev-br avatar Mar 06 '24 10:03 ev-br

gh-20085 contains what's needed in cluster for the test suite to pass (having JAX support is a handy way to check read-only arrays)

lucascolley avatar Mar 06 '24 11:03 lucascolley

@ev-br I can help out with this, but I'm not quite sure where to start. Any recommended places to look at?

Kai-Striega avatar Mar 07 '24 02:03 Kai-Striega

That would be great @Kai-Striega ! I would just grep pyx files by subpackage, along the lines of $ grep def scipy/optimize -rn to list all cython function declarations. For each of those, if it has memoryviews, add const to all which are not outputs.

IOW, def func(const double[::1] x, double[::1] out) means that x can be a regular or a readonly array, and out cannot be readonly.

ev-br avatar Mar 07 '24 09:03 ev-br

Great. I'll try to audit what needs to be done tonight/this weekend

EDIT: I did a quick run on this, here is the invocation and output:

~/Projects/scipy on main                                                                                                                                                                            at 13:02:25
❯ grep def --include ./scipy\*.pyx  -rnc
./scipy/cluster/_vq.pyx:32
./scipy/cluster/_optimal_leaf_ordering.pyx:15
./scipy/cluster/_hierarchy.pyx:117
./scipy/ndimage/src/_cytest.pyx:28
./scipy/ndimage/src/_ni_label.pyx:26
./scipy/linalg/_decomp_lu_cython.pyx:9
./scipy/linalg/_matfuncs_sqrtm_triu.pyx:4
./scipy/linalg/_cythonized_array_utils.pyx:40
./scipy/linalg/_solve_toeplitz.pyx:9
./scipy/optimize/_trlib/_trlib.pyx:38
./scipy/optimize/_group_columns.pyx:11
./scipy/optimize/_highs/cython/src/_highs_constants.pyx:0
./scipy/optimize/_highs/cython/src/_highs_wrapper.pyx:35
./scipy/optimize/_lsq/givens_elimination.pyx:9
./scipy/optimize/_bglu_dense.pyx:48
./scipy/optimize/tnc/_moduleTNC.pyx:14
./scipy/io/matlab/_streams.pyx:45
./scipy/io/matlab/_mio_utils.pyx:8
./scipy/io/matlab/_mio5_utils.pyx:109
./scipy/_lib/_ccallback_c.pyx:16
./scipy/_lib/_test_deprecation_def.pyx:3
./scipy/_lib/_test_deprecation_call.pyx:2
./scipy/_lib/messagestream.pyx:11
./scipy/special/_specfun.pyx:117
./scipy/special/_cdflib.pyx:494
./scipy/special/_ellip_harm_2.pyx:39
./scipy/special/_test_internal.pyx:18
./scipy/special/_comb.pyx:4
./scipy/fftpack/convolve.pyx:8
./scipy/interpolate/interpnd.pyx:75
./scipy/interpolate/_bspl.pyx:30
./scipy/interpolate/_ppoly.pyx:72
./scipy/interpolate/_rgi_cython.pyx:4
./scipy/sparse/csgraph/_shortest_path.pyx:63
./scipy/sparse/csgraph/_traversal.pyx:46
./scipy/sparse/csgraph/_flow.pyx:47
./scipy/sparse/csgraph/_tools.pyx:21
./scipy/sparse/csgraph/_matching.pyx:36
./scipy/sparse/csgraph/_reordering.pyx:15
./scipy/sparse/csgraph/_min_spanning_tree.pyx:4
./scipy/spatial/_qhull.pyx:241
./scipy/spatial/_ckdtree.pyx:89
./scipy/spatial/_voronoi.pyx:7
./scipy/spatial/_hausdorff.pyx:8
./scipy/spatial/transform/_rotation.pyx:166
./scipy/signal/_max_len_seq_inner.pyx:5
./scipy/signal/_peak_finding_utils.pyx:9
./scipy/signal/_upfirdn_apply.pyx:58
./scipy/signal/_spectral.pyx:5
./scipy/signal/_sosfilt.pyx:17
./scipy/stats/_levy_stable/levyst.pyx:11
./scipy/stats/_ansari_swilk_statistics.pyx:23
./scipy/stats/_rcont/rcont.pyx:11
./scipy/stats/_stats.pyx:101
./scipy/stats/_qmc_cy.pyx:50
./scipy/stats/_biasedurn.pyx:38
./scipy/stats/_unuran/unuran_wrapper.pyx:201
./scipy/stats/_sobol.pyx:47

Note that this significantly overestimates the work to be done, as we also include cdef statements inside the function

Kai-Striega avatar Mar 08 '24 01:03 Kai-Striega

@ev-br I've got a question for you: I tried to change the argument a on this line line to be const-qualified. However I get the following warnings:

scipy/linalg/_cythonized_array_utils.cpython-311-darwin.so.p/_cythonized_array_utils.c:21402:79: warning: passing 'const float *' to parameter of type '__pyx_t_5scipy_6linalg_13cython_lapack_s *' (aka 'float *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    __pyx_f_5scipy_6linalg_13cython_lapack_sgetrf((&__pyx_v_n), (&__pyx_v_n), (&(*((float const  *) ( /* dim=1 */ ((char *) (((float const  *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_3 * __pyx_v_a.strides[0]) )) + __pyx_t_4)) )))), (&__pyx_v_n), (&(__pyx_v_piv[0])), (&__pyx_v_info));
                                                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scipy/linalg/_cythonized_array_utils.cpython-311-darwin.so.p/_cythonized_array_utils.c:21855:79: warning: passing 'const double *' to parameter of type '__pyx_t_5scipy_6linalg_13cython_lapack_d *' (aka 'double *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    __pyx_f_5scipy_6linalg_13cython_lapack_dgetrf((&__pyx_v_n), (&__pyx_v_n), (&(*((double const  *) ( /* dim=1 */ ((char *) (((double const  *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_3 * __pyx_v_a.strides[0]) )) + __pyx_t_4)) )))), (&__pyx_v_n), (&(__pyx_v_piv[0])), (&__pyx_v_info));
                                                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scipy/linalg/_cythonized_array_utils.cpython-311-darwin.so.p/_cythonized_array_utils.c:22310:79: warning: passing 'const __pyx_t_float_complex *' to parameter of type '__pyx_t_float_complex *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    __pyx_f_5scipy_6linalg_13cython_lapack_cgetrf((&__pyx_v_n), (&__pyx_v_n), (&(*((__pyx_t_float_complex const  *) ( /* dim=1 */ ((char *) (((__pyx_t_float_complex const  *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_3 * __pyx_v_a.strides[0]) )) + __pyx_t_4)) )))), (&__pyx_v_n), (&(__pyx_v_piv[0])), (&__pyx_v_info));
                                                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scipy/linalg/_cythonized_array_utils.cpython-311-darwin.so.p/_cythonized_array_utils.c:22769:79: warning: passing 'const __pyx_t_double_complex *' to parameter of type '__pyx_t_double_complex *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    __pyx_f_5scipy_6linalg_13cython_lapack_zgetrf((&__pyx_v_n), (&__pyx_v_n), (&(*((__pyx_t_double_complex const  *) ( /* dim=1 */ ((char *) (((__pyx_t_double_complex const  *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_3 * __pyx_v_a.strides[0]) )) + __pyx_t_4)) )))), (&__pyx_v_n), (&(__pyx_v_piv[0])), (&__pyx_v_info));
                                                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 warnings generated.

I understand what the warnings mean. My question is whether we should also const qualify the relevant functions in the call to lapack?

Kai-Striega avatar Mar 12 '24 02:03 Kai-Striega

Ouch. I don't know whether

  • cython does sensible things in if lapack_t is float: ... selections below https://github.com/scipy/scipy/blob/9c98ee762a95cf46d8f91f1c1a5547701538d286/scipy/linalg/_cythonized_array_utils.pyx#L29
  • lapack bindings are prepared to handle const float* arguments. there is no notion of const in F77.

ev-br avatar Mar 12 '24 10:03 ev-br

Maybe @ilayn has an opinion here?

Kai-Striega avatar Mar 12 '24 21:03 Kai-Striega

Probably you will need the const version of that fused type if you want to pass const lapack_t. This whole templating business we have is just a neverending chore in my opinion (say, if compared to Rust generics)

So I am tempted to rewrite them spelled out four times for these helper functions rather than dealing with these fake-C constructs of Cython.

ilayn avatar Mar 13 '24 06:03 ilayn

Actually since @rgommers triggered me to write more C code instead of Cython in the scipy.special rewrites, I think it is better for us to have more C code than Cython for array manipulation funcs. We really don't benefit from Cython which is best for mostly quick bursts of typed arithmetic and for loops but instead we are extensively using arrays and low-level C access. That's both not useful for us and unfair to Cython.

I still don't have a full grasp of NumPy C API but I think I got the gist of it lately. So feel free to modify things as you see fit. I will circle back to these at some point anyways.

ilayn avatar Mar 13 '24 19:03 ilayn

Public Cython API we expose (cython_blas, cython_lapack, cython_special) is the exception to what is actionable in this issue. We tried adding const to signatures before, and found it was a backwards compatibility break. Cython considers any change in signature breaking, even if adding const in the equivalent C code would be perfectly compatible. So I suggest not touching public Cython API.

rgommers avatar Mar 14 '24 07:03 rgommers

We really don't benefit from Cython which is best for mostly quick bursts of typed arithmetic and for loops but instead we are extensively using arrays and low-level C access. That's both not useful for us and unfair to Cython

While this starts to border OT for this issue, I'm going to go on record that I disagree with the statement above. We definitely do benefit from both Cython and f2py, with all their shortcomings. The value we get from boundschecking, from abstracting away pointer arithmetics and refcounting is huge. Note that scipy.special is indeed special: in there you basically have scalar kernels + ugly magic of generating the ufuncs. I cannot believe anybody would seriously suggest switching back to generating the python glue manually.

So realistically, there are IMO two things to sort out if we want to gradually have more C++ than Cython:

  • what's the best way to generate the python glue. FWIW, thin cython wrappers are easy to write, easy to read, not error-prone, and well understood. What are the shortcomings?
  • what's the best way to handle arrays in C++. Not via pointer arithmerics. No, definitely not via NpyIter :-).

If we don't sort these out, we're going to end up in a mess which is way worse than the status quo. Do I have my answers? Sort of. In-progress. In https://github.com/scipy/scipy/pull/19753.

ev-br avatar Mar 14 '24 16:03 ev-br

The value we get from boundschecking, from abstracting away pointer arithmetics and refcounting is huge.

Ah, I don't mean to use less Cython. I want to use it where it matters. Let me clarify. I mean there is no significant speed benefit to do array operations in Cython. NumPy is already pretty performant, for many array operations. In fact often, you don't even need to type the numpy arrays. They are already fast enough.

However if you want to do certain things involving for loops and other ops there is crazy amount of performance that Cython offers. What I meant is that if we really want to get Cython benefits we probably should use it where it matters and not just for minute performance improvements. The function @Kai-Striega mentioned is a good example in my opinion. I wrote it when I did not have any other choice. However Cython's fused type handling is really not so robust and takes quite a bit of Python machinery around it to provide the right argument at the right contiguousness etc. It is getting better but quite not there yet in my opinion.

I think we are on the same page with your comments anyways. Regarding the two bullet points you mentioned,

  • Cython is injecting too much code in between and binary size increase. They are really great for C-function binding just do cimport ... extern ... and we are good to go. So in that sense we should use more of it. Definitely much easier than pybind11 or nanobind.
  • If we have to, I would side with using C and not C++. Then it is pretty mundane old-school flat memory with A[ row*n + col ] = 5. silliness with all unsafe stuff about C but at least very robust infrastructure/compiler support and backwards compatible language not changing incredibly every 3-5 years.

ilayn avatar Mar 14 '24 18:03 ilayn

I mean there is no significant speed benefit to do array operations in Cython. NumPy is already pretty performant, for many array operations.

Yes, until you need a for loop. We all know it, and you write exactly this below. So no preaching to the choir. My point is that needing a for loop is not a fringe rarity, and it should not be made prohibitively difficult.

Now, for cython producing too fat binaries, that's actually actionable (or at least researcheable). Cool!

  • what adds the most fat, what can an author of a cython module do to slim it down?
  • cython naturally encourages to mix pure python and typed C-like constructs. A pure guesswork is that it's this pure python which adds to the module size, is this correct?

Flat memory: yes, as long as I can switch the boundschecking on :-).

For C/C++ : oh beware. THere will be six opinions for every five devs :-). All the more important to formulate guidelines, on a common denominator, however informal.

We should probably take this discussion to a separate issue anyway.

ev-br avatar Mar 14 '24 18:03 ev-br

Bringing the discussion back to this issue: I think I'm going to leave these functions till last, or at least until we have reached a consensus regarding our use of Cython/C/C++.

Kai-Striega avatar Mar 14 '24 20:03 Kai-Striega

what adds the most fat, what can an author of a cython module do to slim it down?

See:

  • https://cython.readthedocs.io/en/latest/src/userguide/faq.html#how-do-i-reduce-the-size-of-the-binary-modules
  • https://github.com/cython/cython/issues/6045
  • https://github.com/cython/cython/issues/4425#issuecomment-1966330797
  • https://github.com/cython/cython/issues/6051

matusvalo avatar Mar 14 '24 21:03 matusvalo

This remains a thing that causes issues, so it'd be good to get it done. I edited the issue description to add a tracker per submodule. If you do one, please ensure to do a whole submodule and then tick it off in the tracker.

Cc @czgdp1807

rgommers avatar May 29 '24 07:05 rgommers

Sure. I will work on it. I think the issue will be completed when all the tasks here are done.

czgdp1807 avatar May 29 '24 08:05 czgdp1807

I opened 3 PRs today. #20872, #20873, #20874. Let me know if I should work on modifying tests for these const additions to make sure someone doesn't undo the const additions in future. Other than this I can't think of more benefits of adding/modifying tests. Let me know if I am missing out something.

czgdp1807 avatar Jun 03 '24 14:06 czgdp1807

Nice! I don't think we need tests for that. Removing const from anywhere is highly unlikely, and the kind of thing that should be caught from code review. I think we either test read-only arrays structurally across all APIs (which would be a lot of work), or we don't test it unless we have a good reason to (like a bug report).

rgommers avatar Jun 03 '24 16:06 rgommers

we either test read-only arrays structurally across all APIs

well, good job we are adding array API support and testing with read-only JAX arrays, then 😉

lucascolley avatar Jun 03 '24 16:06 lucascolley

_lib, fftpack, integrate, ndimage, special, io don't have any array usage in pyx files. So this issue is complete from my end.

czgdp1807 avatar Jun 05 '24 08:06 czgdp1807

io too?

lucascolley avatar Jun 05 '24 08:06 lucascolley

Yes. I forgot to mention it. io too doesn't have any array usage in pyx files.

czgdp1807 avatar Jun 05 '24 09:06 czgdp1807

great, thanks @czgdp1807 ! If we've missed any, JAX should tell us as array API standard support is expanded.

lucascolley avatar Jun 05 '24 09:06 lucascolley

Awesome, thanks for completing this. And yes, JAX will be a nice way to verify without having to write more tests:)

rgommers avatar Jun 05 '24 09:06 rgommers