orix icon indicating copy to clipboard operation
orix copied to clipboard

Consistent lazy computations for demanding tasks

Open hakonanes opened this issue 2 years ago • 5 comments

The outer products between quaternion-quaternion or quaternion-vector are usually the most demanding tasks, I find. Having lazy computations for them all would be convenient. I had a look at which methods don't have a lazy implementation when reviewing #325, and it seems like there isn't much work required to support lazy computations here as well.

Supports lazy computations (including #317 and #325):

  • [x] Orientation.get_distance_matrix()
  • [x] Misorientation.get_distance_matrix()
  • [x] outer():
    • [x] Quaternion
    • [x] Rotation -> Misorientation, Orientation (these classes don't overwrite the method)
  • [ ] dot()
    • [ ] Quaternion -> Rotation, Misorientation
    • [ ] Orientation
  • [ ] dot_outer()
    • [ ] Quaternion
    • [ ] Rotation -> Misorientation
    • [ ] Orientation
  • [ ] angle_with(), uses dot()
    • [ ] Rotation -> Misorientation
    • [ ] Orientation
  • [ ] angle_with_outer(), uses dot_outer()
    • [ ] Rotation -> Misorientation
    • [ ] Orientation

Computing lazy dot products with dot() and dot_outer() will return Dask arrays, so care must be taken to document this properly. Lazy implementations of these should be as simple as replacing np with da, and supporting the extra lazy API parameters lazy, chunksize and progressbar.

hakonanes avatar Apr 29 '22 09:04 hakonanes

I agree with adding Dask for these demanding tasks, certainly functions that involve outer products. Do you think we should also do the same for dot()?

I was thinking about implementing a lazy implementation of dot_outer() in #325, but left that PR to be a small enhancement. Glad to see we are on the same wavelength about this!

Computing lazy dot products with dot() and dot_outer() will return Dask arrays, so care must be taken to document this properly. Lazy implementations of these should be as simple as replacing np with da, and supporting the extra lazy API parameters lazy, chunksize and progressbar.

I think that whilst we should allow for Dask computation, we should not expose the user to Dask arrays at any point (they will still be able to access the private functions if required). This is how it is handled in #325 and also #317, and it keeps a simple API.

harripj avatar Apr 29 '22 13:04 harripj

We can identify and keep track of the methods in this issue. I didn't add a milestone because we can implement them when we need for them.

Do you think we should also do the same for dot()?

Yes, even though there might be few cases where a dot product computation fills memory. dot() is used in angle_with(), so there might be other cases with the latter where a computation fills memory.

we should not expose the user to Dask arrays at any point

I agree when the user is an end user. When the user is another package, like kikuchipy, diffsims or pyxem, where, say, we wanted to get dot products or angles but not compute them immediately, I think having a public API returning a Dask array would be benefitial. Alternatively we could use the private functions, but that goes against "general rules" when using package dependencies. It's not necessary at the moment, but we shouldn't rule it out.

hakonanes avatar Apr 29 '22 13:04 hakonanes

I agree when the user is an end user. When the user is another package, like kikuchipy, diffsims or pyxem, where, say, we wanted to get dot products or angles but not compute them immediately, I think having a public API returning a Dask array would be benefitial. Alternatively we could use the private functions, but that goes against "general rules" when using package dependencies. It's not necessary at the moment, but we shouldn't rule it out.

Good point, if you envisage these packages needing this functionality then we should build it in. Perhaps it could be as simple as adding a flag argument return_dask or similar to the function?

harripj avatar Apr 29 '22 14:04 harripj

return_dask or similar to the function

HyperSpy introduced the boolean parameter lazy_output in their latest release (see docs), which I think is understandable enough. What do you think?

hakonanes avatar Apr 29 '22 14:04 hakonanes

HyperSpy introduced the boolean parameter lazy_output in their latest release (see docs), which I think is understandable enough. What do you think?

Makes sense to me! It's a good align syntax with a well-used package too.

harripj avatar Apr 29 '22 14:04 harripj