numpy icon indicating copy to clipboard operation
numpy copied to clipboard

DOC: the recommended ruff rule for migration to numpy2 is missing many detections and fixes

Open ppashakhanloo opened this issue 1 year ago • 8 comments

Issue with current documentation:

Hello, Numpy folks! There is a ruff rule (NPY201) that is recommended for migration from numpy to numpy2. I did a thorough comparison between Python API removals and deprecations with what the NPY201 rule provides. There are many removals and deprecations that are in the documentation but the ruff rule does not consider. In many of the cases, a fix could be performed automatically.

Here is a summary of what is missing from the rule NPY201:

Changes that are not detected at all

  • np.geterrobj, np.seterrobj and the related ufunc keyword argument extobj= have been removed. The preferred replacement for all of these is using the context manager with np.errstate():.
  • removed ERR_*, SHIFT_*, np.kernel_version, np.numarray, np.oldnumeric and np.set_numeric_ops.
  • removed namespaces: np.FLOATING_POINT_SUPPORT, np.FPE_*, np.CLIP, np.WRAP, np.RAISE, np.BUFSIZE, np.UFUNC_BUFSIZE_DEFAULT, np.UFUNC_PYVALS_NAME, np.ALLOW_THREADS, np.MAXDIMS, np.MAY_SHARE_EXACT, np.MAY_SHARE_BOUNDS.
  • np.issctype, np.maximum_sctype, np.obj2sctype, np.sctype2char, np.sctypes were all removed from the main namespace without replacement, as they where niche members.
  • np.compare_chararrays has been removed from the main namespace. Use np.char.compare_chararrays instead.
  • The charrarray in the main namespace has been deprecated. It can be imported without a deprecation warning from np.char.chararray for now, but we are planning to fully deprecate and remove chararray in the future.
  • np.format_parser has been removed from the main namespace. Use np.rec.format_parser instead.
  • The experimental numpy.array_api submodule has been removed. Use the main numpy namespace for regular usage instead, or the separate array-api-strict package for the compliance testing use case for which numpy.array_api was mostly used.
  • Support for seven data type string aliases has been removed from np.dtype: int0, uint0, void0, object0, str0, bytes0 and bool8.
  • np.trapz has been deprecated. Use np.trapezoid or a scipy.integrate function instead.
  • np.in1d has been deprecated. Use np.isin instead.
  • Arrays of 2-dimensional vectors for np.cross have been deprecated. Use arrays of 3-dimensional vectors instead.
  • np.dtype("a") alias for np.dtype(np.bytes_) was deprecated. Use np.dtype("S") alias instead.
  • Use of keyword arguments x and y with functions assert_array_equal and assert_array_almost_equal has been deprecated. Pass the first two arguments as positional arguments instead.

Changes that are detected but no automatic fix is provided

  • np.cast has been removed. The literal replacement for np.cast[dtype](arg) is np.asarray(arg, dtype=dtype).
  • np.set_string_function has been removed. Use np.set_printoptions instead with a formatter for custom printing of NumPy objects.
  • np.recfromcsv and recfromtxt are now only available from np.lib.npyio.
  • np.asfarray has been removed. Use np.asarray with a proper dtype instead.
  • np.find_common_type has been removed. Use numpy.promote_types or numpy.result_type instead. To achieve semantics for the scalar_types argument, use numpy.result_type and pass 0, 0.0, or 0j as a Python scalar instead.
  • np.nbytes has been removed. Use np.dtype(<dtype>).itemsize instead.

Idea or request for content:

My suggestion is to add these changes to the ruff rule. Otherwise, it would be really helpful if the provided and not-provided changes were documented (either in the corresponding ruff rule page or the migration guide) so the users would know what they were getting (and not getting) by applying the rule.

ppashakhanloo avatar Jun 26 '24 14:06 ppashakhanloo

Ping @mtsokol in case any of this was done on purpose.

ngoldbaum avatar Jun 26 '24 14:06 ngoldbaum

Thank you for writing a thorough comparison!

Changes that are not detected at all

I think that most of these members were niche, and GitHub search showed none or small number of usages (like https://github.com/search?q=np.FLOATING_POINT_SUPPORT&type=code).

Also the migration guide mentions all of them: https://numpy.org/doc/stable/numpy_2_0_migration_guide.html (I think np.format_parser and data type string aliases are missing, and two np.char related ones).

Generally I think that most of them could be still added to the Ruff rule.

Other comments: For np.cross we have a deprecation warning, I don't think it can be detected by a Ruff rule. assert_array_equal also has a deprecation warning. np.array_api was experimental, therefore I don't think we need a check for it.

Changes that are detected but no automatic fix is provided

For these points I think it's just a matter of figuring out how to achieve it in Rust. I think that current migration messages are clear and I expect that downstream libraries would use these API members in a couple of places at most, therefore a manual intervention is still feasible offhand. The rule definitely excels for e.g. replacing np.float_ with np.float64 that can occur in hundred of places in the codebase (like it did in SciPy). One side note, np.recfromcsv and np.recfromtxt aren't exported by npyio (uh, the release note is outdated), instead genfromtxt should be used (as the migration guide and Ruff rule explain).

mtsokol avatar Jun 26 '24 15:06 mtsokol

np.issctype, np.maximum_sctype, np.obj2sctype, np.sctype2char, np.sctypes were all removed from the main namespace without replacement, as they where niche members.

I think ruff Rule detects all of them.

mtsokol avatar Jun 27 '24 08:06 mtsokol

Will be partially fixed in https://github.com/astral-sh/ruff/pull/12065.

mtsokol avatar Jun 27 '24 10:06 mtsokol

PyTensor uses np.MAXDIMS. Information on how to replace this could be useful. (Although it seems that using NPY_RAVEL_AXIS should work.)

brendan-m-murphy avatar Aug 07 '24 08:08 brendan-m-murphy

I'm a little confused - are you using the python np.MAXDIMS symbol or the C NPY_MAXDIMS macro? For the latter, the discussion about NPY_RAVEL_AXIS is what you likely need in the migration guide. For the former, can you share what you were doing with that symbol in python so we can suggest what you should do?

ngoldbaum avatar Aug 07 '24 16:08 ngoldbaum

The code was using both. For np.MAXDIMS, this was usually to store an int that would be saved in a "params" dict that could be used by python objects, but could also be translated to C code. The NPY_MAXDIMS macro was used in some C code for some of the operations in Pytensor. Where NPY_MAXDIMS was used in (python functions that create) C code, I switched to NPY_RAVEL_AXIS. For places where np.MAXDIMS needed to be stored as an int in python code, I set the value to either 32 or the smallest int64, depending on the version on numpy.

The code base is pretty uneven. (Pytensor is a fork of a fork of Theano... they're starting to move away from the C code but they still need to support it. Parts of the code essentially compile algebraic operations into CPython code, then compile it, and import the resulting python module.)

This commit has the np.MAXDIMS work around: https://github.com/brendan-m-murphy/pytensor/pull/2/commits/9a248eb5d8df9f06fa4bf8695e8d4002dbaac778

brendan-m-murphy avatar Aug 07 '24 16:08 brendan-m-murphy