MPoL icon indicating copy to clipboard operation
MPoL copied to clipboard

Memory efficient data input options

Open iancze opened this issue 1 year ago • 0 comments

Is your feature request related to a problem or opportunity? Please describe. We were strict about our inputs to the Gridder object, expecting that uu and vv are measured in kilolambda and have shape (nchan, nvis), and weights have shape (nchan, nvis).

Describe the solution you'd like This strictness may be a bit cumbersome, especially when working with ALMA spectral line datasets with a large number of channels. For example, the measurement set more efficiently stores the baselines in meters (so they are the same for every channel) and when the weights are the same for each channel, only one weight is stored per baseline. This means that on disk, uu, vv, and weights have shape (nvis) instead of (nchan, nvis). This can be a considerable memory saving when talking about large visibility datasets with hundreds or even thousands of channels.

Describe alternatives you've considered At minimum, we could port convenience routines to convert these quantities from visread to MPoL, or just reference that they exist in visread. This might make life easier for the user in that they keep the filesize on disk small, but may still pose memory requirement issues when doing the inference.

A more advanced operation would be to adjust the Gridder, or an alternate class of Gridder, to take in the measurement set-like data products and then perform the gridding operation in a memory efficient manner. This could be helpful, but should only be worked on after we've done a proper memory profiling of a whole image synthesis procedure. It could be that the actual image optimization (and associated derivatives) are the largest bottleneck, anyway.

iancze avatar Nov 18 '22 19:11 iancze