mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Support for numpy.dtypes.StringDType in AtomGroup containers?

Open jauy123 opened this issue 11 months ago • 1 comments

In numpy 2.0+, the new dtype for strings in numpy are StringDType(). However, I don't believe MDAnalysis support this at all?

Consider the following code:

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD

import numpy as np

u = mda.Universe(PSF, DCD)
is_not_string = u.residues.resnames

Running print(is_not_string.dtype) yields object. Since the dtype is objects, it means that any of the new numpy 2.0+ numpy.strings functions would not work and it would require the user to manually cast the ndarray over from the object dtype into numpy.dtypes.StringDType in order to get it working. Wouldn't it be easier just to have the instance just automatically create create the ndarray as StringDType for convenience?

Relevant numpy documentation: https://numpy.org/doc/stable/user/basics.strings.html https://numpy.org/doc/stable/reference/routines.strings.html#module-numpy.strings

jauy123 avatar Dec 20 '24 18:12 jauy123

At the moment we still support numpy ≥ 1.23.2

https://github.com/MDAnalysis/mdanalysis/blob/59e478db53ffb974fe94539bfc520c84a1946e72/package/pyproject.toml#L32

so we cannot use features only available in numpy 2.0+.

I am not sure when we will stop supporting numpy 1.x possibly in 2 years (~end of 2026), according to SPEC 0... but when that happens, we can use new types.

orbeckst avatar Jan 06 '25 23:01 orbeckst