mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Multiprecision in core, NumPy dtypes and Cythonization

Open hmacdope opened this issue 2 years ago • 12 comments

Is your feature request related to a problem?

With the ongoing work on Cythonising Core as part of the CZI performance track (see #3683), we need to start proper discussion of dtypes and multi-precision . This was first broached in #3545 which highlights some inconsistent use of NumPy dtypes and various examples of willy nilly casting to and from float32 and float64.

It is important to discuss what kinds of precision we want to support as when writing a compiled layer, the design can look very different depending on if multiple datatypes are supported and if templates need to be used or not.

While interoperability at the NumPy level in compiled land is not too difficult, auxiliary C++ datastructures that own memory such as a coordinate buffer need careful thought.

My feeling is that while multiprecision support is not essential for speed, it is important for interoperability with other packages and for consistency throughout the package.

For me most pressing issue is to decide whether we want to support a dtype argument to the core data-structures and what that would mean downstream. I started on this for Timestep in #3683, but removed the dtype kwarg pending further discussions.

Current state of play

  • Coordinates are always a float32
  • Unitcell is nominally a float32 in Timestep but is promoted in the routines in calc_distances.h
  • You cannot specify what precision your coordinates are in as an argument to any of the Core datastructures

Describe the solution you'd like

I would like to see support for double precision and a dtype kwarg in the core datastructures.

For example:

import MDAnalysis as mda
u = mda.Universe(traj, top, dtype=np.float64)
ts = mda.coordinaties.timestep.Timestep(n_atoms, dtype=np.float64)

Of course we would preserve the current defaults of np.float32

We could also support mixed precision timesteps, where the unitcell and particle dependent data such as positions can have different datatypes, however this is not ideal due to poor performance in regular ops and compiler vectorisation and data pipe-lining.

Changes to support multiple precision would likley need to be coupled with template-ization of calc_distances.h which shouldn't be too challenging.

Describe alternatives you've considered

We stick to the current setup as we move forward with CZI stuff.

To discuss

I would love feedback from anyone, especially the @MDAnalysis/coredevs team, as this is potentially a large change.

hmacdope avatar Jun 07 '22 00:06 hmacdope

On the surface it looks nice to be able to provide dtype=np.float64 to the Universe. I like that it looks like numpy.

Under the hood it will likely be a mess.

What is the advantage of being able to set the precision, i.e., what are the use cases that we want to support?

orbeckst avatar Jun 07 '22 01:06 orbeckst

I might copy paste a few pros/cons from #3545

Benefits

  • We move away from a more haphazard system of doing some things in some precision and others in others on a case by case basis where we shuffle datatypes back and forth.
  • More accuracy can be passed through if required for certain things.
  • If other packages with which we integrate offer or require higher precision, we are a guaranteed data loss step
  • related to above but if people do their simulations at higher precision we throw that all away (I imagine this is rare)
  • May aid in closer integration with other packages in compiled languages.
  • Supporting a fundamental datatype, while preserving current defaults seems like a decent idea.

Drawbacks

  • If it aint broke don't fix it, the current system is obviously satisfactory for most use cases.
  • large changes to core, a lot of effort with plenty that can go wrong for how much benefit?
  • Is input data even of a high enough precision to warrant respecting dtype? Not sure how strong this argument as can't forsee all uses.
  • Double precision is slower
  • How much do users actually gain in accuracy in a typical calculation, is 1e^-? difference in a distance worth the trouble?
  • Would we have to add a large chunk of tests? All the regression tests with FP values would be different under a different precision model.

hmacdope avatar Jun 07 '22 01:06 hmacdope

Another note here is that if we do decided to restrict to float32, we can get better performance out of the numpy-cython C-API parts of the library by enabling C level typing of the numpy arrays, eg

cimport numpy as cnp
cnp.import_array()

arr = cnp.ndarray[DTYPE_t, ndim=2]

hmacdope avatar Jun 14 '22 23:06 hmacdope

This seems hard to judge, on the surface I think my initial view is similar to Oli's. Is progress on the CZI objectives impeded by this, or it is just that if you optimize stuff the way things are we may get locked in to a rigid design?

I suppose one idea might be to not jump into this right away, but when writing new things or making changes, leaving the door open for a dtype=np.float32 argument in some places, if it isn't too distracting.

That said, I know in SciPy for example we have quite a few functionalities that don't allow custom dtypes--pretty sure KDTree ultimately casts to double for reasonably high-precision distance work. I don't know that this is really considered a design flaw? Is the rise of half-precision machine learning another factor to consider here? Downcasting to half is simple enough from our output though? And a custom analysis class could operate in half precision without too much disruption I suppose, if there is even a genuine need for that rather than just feeding to some ML library.

tylerjereddy avatar Jun 15 '22 02:06 tylerjereddy

I don't think we are being impeded by the lack of datatype flexibility, if anything it makes things a lot more complicated.

However as you say, the pursuit of pure performance using the current setup will likely come at the cost of some flexibility long term. This is why I thought this was worth some discussion now.

I suppose one idea might be to not jump into this right away, but when writing new things or making changes, leaving the door open for a dtype=np.float32 argument in some places, if it isn't too distracting.

This is the approach I took in Cythonising Timestep in #3683, but I am trying to avoid leaving too many dangling ends for someone to clean up later.

For mine downcasting is no stress the output is no stress, its more if anyone (?) uses MDA for high-ish precision things.

hmacdope avatar Jun 15 '22 06:06 hmacdope

Did I see on Discord that some folks at a conference wanted multi-precision? Could someone summarize that here? For the specific linked PR gh-3544, that subset of changes looks like it could go in pretty soon IMO, even if we don't go too crazy on this just yet.

tylerjereddy avatar Aug 28 '22 01:08 tylerjereddy

It would be nice for performance reason to be single precision. (Hugo and I have work on vectorising where width matters)

Also the current pattern of inputting single and returning double makes chaining calls trickier / less logical).

I think a dtype argument to Universe, and all functions being fused types/templates is there best of both worlds.

richardjgowers avatar Aug 28 '22 01:08 richardjgowers

So, I think I'm a bit more positive about changes like this in isolation if the tests pass and we're backwards compatible, at least initially. Not sure we should jump in on the bigger/more invasive changes right away, but gradually working toward the dtype flexibility may not hurt as accelerators with low-precision performance offerings become more popular, etc.

tylerjereddy avatar Aug 28 '22 01:08 tylerjereddy

I think we're agreeing that flexibility is good and single-precision default is good (me for backward compatibility reasons, and you for performance), IIUC.

tylerjereddy avatar Aug 28 '22 01:08 tylerjereddy

I'm also supposing that if we're mostly single-precision internally, but sometimes returning double, that it isn't too much of a "break" to start returning in the true precision used for the calculation, since the double type is a bit of a lie if it is just a return-time upcast.

tylerjereddy avatar Aug 28 '22 01:08 tylerjereddy

Is the best strategy perhaps to work on type flexibility around the edges, eg on all ops before relaxing the explicit requirements such as this one. https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/timestep.pyx#L200

hmacdope avatar Aug 28 '22 02:08 hmacdope

Is the best strategy perhaps to work on type flexibility around the edges

Might be, at least for me the idea of starting with the more straightforward changes sounds appealing. Worst case scenario you end up with a subset of functions that gain more flexibility even though the core library remains type-rigid for a while. Just my opinion though, I know some people don't like partial implementations as a whole.

tylerjereddy avatar Aug 28 '22 16:08 tylerjereddy