mdanalysis
mdanalysis copied to clipboard
Multiprecision in core, NumPy dtypes and Cythonization
Is your feature request related to a problem?
With the ongoing work on Cythonising Core as part of the CZI performance track (see #3683), we need to start proper discussion of dtypes
and multi-precision . This was first broached in #3545 which highlights some inconsistent use of NumPy dtypes and various examples of willy nilly casting to and from float32
and float64
.
It is important to discuss what kinds of precision we want to support as when writing a compiled layer, the design can look very different depending on if multiple datatypes are supported and if templates need to be used or not.
While interoperability at the NumPy
level in compiled land is not too difficult, auxiliary C++ datastructures that own memory such as a coordinate buffer need careful thought.
My feeling is that while multiprecision support is not essential for speed, it is important for interoperability with other packages and for consistency throughout the package.
For me most pressing issue is to decide whether we want to support a dtype
argument to the core data-structures and what that would mean downstream. I started on this for Timestep
in #3683, but removed the dtype
kwarg pending further discussions.
Current state of play
- Coordinates are always a
float32
- Unitcell is nominally a
float32
inTimestep
but is promoted in the routines incalc_distances.h
- You cannot specify what precision your coordinates are in as an argument to any of the
Core
datastructures
Describe the solution you'd like
I would like to see support for double precision and a dtype
kwarg in the core datastructures.
For example:
import MDAnalysis as mda
u = mda.Universe(traj, top, dtype=np.float64)
ts = mda.coordinaties.timestep.Timestep(n_atoms, dtype=np.float64)
Of course we would preserve the current defaults of np.float32
We could also support mixed precision timesteps, where the unitcell
and particle dependent data such as positions
can have different datatypes, however this is not ideal due to poor performance in regular ops and compiler vectorisation and data pipe-lining.
Changes to support multiple precision would likley need to be coupled with template-ization of calc_distances.h
which shouldn't be too challenging.
Describe alternatives you've considered
We stick to the current setup as we move forward with CZI stuff.
To discuss
I would love feedback from anyone, especially the @MDAnalysis/coredevs team, as this is potentially a large change.
On the surface it looks nice to be able to provide dtype=np.float64
to the Universe
. I like that it looks like numpy.
Under the hood it will likely be a mess.
What is the advantage of being able to set the precision, i.e., what are the use cases that we want to support?
I might copy paste a few pros/cons from #3545
Benefits
- We move away from a more haphazard system of doing some things in some precision and others in others on a case by case basis where we shuffle datatypes back and forth.
- More accuracy can be passed through if required for certain things.
- If other packages with which we integrate offer or require higher precision, we are a guaranteed data loss step
- related to above but if people do their simulations at higher precision we throw that all away (I imagine this is rare)
- May aid in closer integration with other packages in compiled languages.
- Supporting a fundamental datatype, while preserving current defaults seems like a decent idea.
Drawbacks
- If it aint broke don't fix it, the current system is obviously satisfactory for most use cases.
- large changes to
core
, a lot of effort with plenty that can go wrong for how much benefit? - Is input data even of a high enough precision to warrant respecting dtype? Not sure how strong this argument as can't forsee all uses.
- Double precision is slower
- How much do users actually gain in accuracy in a typical calculation, is 1e^-? difference in a distance worth the trouble?
- Would we have to add a large chunk of tests? All the regression tests with FP values would be different under a different precision model.
Another note here is that if we do decided to restrict to float32, we can get better performance out of the numpy-cython C-API parts of the library by enabling C level typing of the numpy arrays, eg
cimport numpy as cnp
cnp.import_array()
arr = cnp.ndarray[DTYPE_t, ndim=2]
This seems hard to judge, on the surface I think my initial view is similar to Oli's. Is progress on the CZI objectives impeded by this, or it is just that if you optimize stuff the way things are we may get locked in to a rigid design?
I suppose one idea might be to not jump into this right away, but when writing new things or making changes, leaving the door open for a dtype=np.float32
argument in some places, if it isn't too distracting.
That said, I know in SciPy for example we have quite a few functionalities that don't allow custom dtypes--pretty sure KDTree
ultimately casts to double
for reasonably high-precision distance work. I don't know that this is really considered a design flaw? Is the rise of half-precision machine learning another factor to consider here? Downcasting to half is simple enough from our output though? And a custom analysis class could operate in half precision without too much disruption I suppose, if there is even a genuine need for that rather than just feeding to some ML library.
I don't think we are being impeded by the lack of datatype flexibility, if anything it makes things a lot more complicated.
However as you say, the pursuit of pure performance using the current setup will likely come at the cost of some flexibility long term. This is why I thought this was worth some discussion now.
I suppose one idea might be to not jump into this right away, but when writing new things or making changes, leaving the door open for a
dtype=np.float32
argument in some places, if it isn't too distracting.
This is the approach I took in Cythonising Timestep
in #3683, but I am trying to avoid leaving too many dangling ends for someone to clean up later.
For mine downcasting is no stress the output is no stress, its more if anyone (?) uses MDA for high-ish precision things.
Did I see on Discord that some folks at a conference wanted multi-precision? Could someone summarize that here? For the specific linked PR gh-3544, that subset of changes looks like it could go in pretty soon IMO, even if we don't go too crazy on this just yet.
It would be nice for performance reason to be single precision. (Hugo and I have work on vectorising where width matters)
Also the current pattern of inputting single and returning double makes chaining calls trickier / less logical).
I think a dtype argument to Universe, and all functions being fused types/templates is there best of both worlds.
So, I think I'm a bit more positive about changes like this in isolation if the tests pass and we're backwards compatible, at least initially. Not sure we should jump in on the bigger/more invasive changes right away, but gradually working toward the dtype
flexibility may not hurt as accelerators with low-precision performance offerings become more popular, etc.
I think we're agreeing that flexibility is good and single-precision default is good (me for backward compatibility reasons, and you for performance), IIUC.
I'm also supposing that if we're mostly single-precision internally, but sometimes returning double, that it isn't too much of a "break" to start returning in the true precision used for the calculation, since the double type is a bit of a lie if it is just a return-time upcast.
Is the best strategy perhaps to work on type flexibility around the edges, eg on all ops before relaxing the explicit requirements such as this one. https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/timestep.pyx#L200
Is the best strategy perhaps to work on type flexibility around the edges
Might be, at least for me the idea of starting with the more straightforward changes sounds appealing. Worst case scenario you end up with a subset of functions that gain more flexibility even though the core library remains type-rigid for a while. Just my opinion though, I know some people don't like partial implementations as a whole.