glue icon indicating copy to clipboard operation
glue copied to clipboard

Allow data to specify statistics about the data to prevent it having to be computed

Open Cadair opened this issue 5 years ago • 4 comments

Is your feature request related to a problem? Please describe it: When using large distributed datasets, or data on slower storage devices it would be useful to be able to use precomputed statistics about the data from the metadata in the file.

This could include things like min / max / std / 99.9% etc. This would then be used by the image viewer etc rather than sampling the array to compute these things.

In addition it might be useful if these statistics could be specified per-axis or per dask chunk. This would allow axes like Stokes to have different dynamic ranges over the whole axis without having to do the computation (which could lead to a lot of dask chunks being loaded).

Describe the solution you'd like: An API for data factories to be able to specify these precomputed statistics.

Cadair avatar May 07 '20 10:05 Cadair

Just a quick note that for more complex cases, overloading compute_statistic might be the easiest way

astrofrog avatar May 07 '20 10:05 astrofrog

Am really interested in working on this issue.

kakirastern avatar May 07 '20 22:05 kakirastern

Let me start with some AIA and HMI data as @Cadair has suggested first.

kakirastern avatar Jul 24 '20 16:07 kakirastern

Found the keywords needed for the statistics of AIA and HMI data sets. Let me try to do the same for IRIS Level 2 raster and SJI data cubes to see if something similar can be found.

kakirastern avatar Jul 31 '20 07:07 kakirastern