DOC: the recommended ruff rule for migration to numpy2 is missing many detections and fixes
Issue with current documentation:
Hello, Numpy folks!
There is a ruff rule (NPY201) that is recommended for migration from numpy to numpy2. I did a thorough comparison between Python API removals and deprecations with what the NPY201 rule provides. There are many removals and deprecations that are in the documentation but the ruff rule does not consider.
In many of the cases, a fix could be performed automatically.
Here is a summary of what is missing from the rule NPY201:
Changes that are not detected at all
np.geterrobj,np.seterrobjand the related ufunc keyword argumentextobj=have been removed. The preferred replacement for all of these is using the context managerwith np.errstate():.- removed
ERR_*,SHIFT_*,np.kernel_version,np.numarray,np.oldnumericandnp.set_numeric_ops. - removed namespaces:
np.FLOATING_POINT_SUPPORT,np.FPE_*,np.CLIP,np.WRAP,np.RAISE,np.BUFSIZE,np.UFUNC_BUFSIZE_DEFAULT,np.UFUNC_PYVALS_NAME,np.ALLOW_THREADS,np.MAXDIMS,np.MAY_SHARE_EXACT,np.MAY_SHARE_BOUNDS. np.issctype,np.maximum_sctype,np.obj2sctype,np.sctype2char,np.sctypeswere all removed from the main namespace without replacement, as they where niche members.np.compare_chararrayshas been removed from the main namespace. Usenp.char.compare_chararraysinstead.- The
charrarrayin the main namespace has been deprecated. It can be imported without a deprecation warning fromnp.char.chararrayfor now, but we are planning to fully deprecate and removechararrayin the future. np.format_parserhas been removed from the main namespace. Usenp.rec.format_parserinstead.- The experimental
numpy.array_apisubmodule has been removed. Use the main numpy namespace for regular usage instead, or the separatearray-api-strictpackage for the compliance testing use case for whichnumpy.array_apiwas mostly used. - Support for seven data type string aliases has been removed from
np.dtype:int0,uint0,void0,object0,str0,bytes0andbool8. np.trapzhas been deprecated. Usenp.trapezoidor ascipy.integratefunction instead.np.in1dhas been deprecated. Usenp.isininstead.- Arrays of 2-dimensional vectors for
np.crosshave been deprecated. Use arrays of 3-dimensional vectors instead. np.dtype("a")alias fornp.dtype(np.bytes_)was deprecated. Usenp.dtype("S")alias instead.- Use of keyword arguments
xandywith functionsassert_array_equalandassert_array_almost_equalhas been deprecated. Pass the first two arguments as positional arguments instead.
Changes that are detected but no automatic fix is provided
np.casthas been removed. The literal replacement fornp.cast[dtype](arg)isnp.asarray(arg, dtype=dtype).np.set_string_functionhas been removed. Usenp.set_printoptionsinstead with a formatter for custom printing of NumPy objects.np.recfromcsvandrecfromtxtare now only available fromnp.lib.npyio.np.asfarrayhas been removed. Usenp.asarraywith a properdtypeinstead.np.find_common_typehas been removed. Usenumpy.promote_typesornumpy.result_typeinstead. To achieve semantics for thescalar_typesargument, usenumpy.result_typeand pass0,0.0, or0jas a Python scalar instead.np.nbyteshas been removed. Usenp.dtype(<dtype>).itemsizeinstead.
Idea or request for content:
My suggestion is to add these changes to the ruff rule. Otherwise, it would be really helpful if the provided and not-provided changes were documented (either in the corresponding ruff rule page or the migration guide) so the users would know what they were getting (and not getting) by applying the rule.
Ping @mtsokol in case any of this was done on purpose.
Thank you for writing a thorough comparison!
Changes that are not detected at all
I think that most of these members were niche, and GitHub search showed none or small number of usages (like https://github.com/search?q=np.FLOATING_POINT_SUPPORT&type=code).
Also the migration guide mentions all of them: https://numpy.org/doc/stable/numpy_2_0_migration_guide.html (I think np.format_parser and data type string aliases are missing, and two np.char related ones).
Generally I think that most of them could be still added to the Ruff rule.
Other comments: For np.cross we have a deprecation warning, I don't think it can be detected by a Ruff rule. assert_array_equal also has a deprecation warning. np.array_api was experimental, therefore I don't think we need a check for it.
Changes that are detected but no automatic fix is provided
For these points I think it's just a matter of figuring out how to achieve it in Rust.
I think that current migration messages are clear and I expect that downstream libraries would use these API members in a couple of places at most, therefore a manual intervention is still feasible offhand.
The rule definitely excels for e.g. replacing np.float_ with np.float64 that can occur in hundred of places in the codebase (like it did in SciPy).
One side note, np.recfromcsv and np.recfromtxt aren't exported by npyio (uh, the release note is outdated), instead genfromtxt should be used (as the migration guide and Ruff rule explain).
np.issctype, np.maximum_sctype, np.obj2sctype, np.sctype2char, np.sctypes were all removed from the main namespace without replacement, as they where niche members.
I think ruff Rule detects all of them.
Will be partially fixed in https://github.com/astral-sh/ruff/pull/12065.
PyTensor uses np.MAXDIMS. Information on how to replace this could be useful. (Although it seems that using NPY_RAVEL_AXIS should work.)
I'm a little confused - are you using the python np.MAXDIMS symbol or the C NPY_MAXDIMS macro? For the latter, the discussion about NPY_RAVEL_AXIS is what you likely need in the migration guide. For the former, can you share what you were doing with that symbol in python so we can suggest what you should do?
The code was using both. For np.MAXDIMS, this was usually to store an int that would be saved in a "params" dict that could be used by python objects, but could also be translated to C code. The NPY_MAXDIMS macro was used in some C code for some of the operations in Pytensor. Where NPY_MAXDIMS was used in (python functions that create) C code, I switched to NPY_RAVEL_AXIS. For places where np.MAXDIMS needed to be stored as an int in python code, I set the value to either 32 or the smallest int64, depending on the version on numpy.
The code base is pretty uneven. (Pytensor is a fork of a fork of Theano... they're starting to move away from the C code but they still need to support it. Parts of the code essentially compile algebraic operations into CPython code, then compile it, and import the resulting python module.)
This commit has the np.MAXDIMS work around: https://github.com/brendan-m-murphy/pytensor/pull/2/commits/9a248eb5d8df9f06fa4bf8695e8d4002dbaac778