DefDAP icon indicating copy to clipboard operation
DefDAP copied to clipboard

Improve masking

Open rhysgt opened this issue 10 months ago • 6 comments

Masking is now performed on accessing data in the datastore.

Removed the preview function in set_mask since it is now extraneous as the original data is not being overwritten.

We should consider moving cropping and masking from hrdic into base

rhysgt avatar Apr 15 '24 14:04 rhysgt

I've made some changes, can you check it works as expected? You can set the mask with: dic_map.data.generate('mask', mask=bool_array). How are these masks generated? Is it quite standard or do you change things about to fit the data? We need to change the mask function so the stored data is not mutated. This can maybe be done by casting as a masked array and then getting the filled array.

mikesmic avatar Apr 17 '24 09:04 mikesmic

Have fixed the problems you stated I think - using a masked array instead of mutation and now generates a null mask in a better way (?)

Moat of the time, the masks I used are quite straightforward, for example (from docs):

To remove data points in dic_map where max_shear is above 0.8, use:

mask = dic_map.data.max_shear > 0.8

To remove data points in dic_map where e11 is above 1 or less than -1, use:

mask = (dic_map.data.e[0, 0] > 1) | (dic_map.data.e[0, 0] < -1)

To remove data points in dic_map where corrVal is less than 0.4, use:

mask = dic_map.corr_val < 0.4

rhysgt avatar Apr 17 '24 15:04 rhysgt

Also - there is an inconsistency in function naming - calc_mask and set_crop?

rhysgt avatar Apr 18 '24 09:04 rhysgt

I did call it set_mask but it would be confusing because it doesn't set anything, it just creates a mask image that the generate function uses. set_crop actually sets crop boundary values. I don't know about passing out masked arrays, will they work with everything else in the library? Although I looked at masking yesterday and I couldn't find a way to create an array with nans set for masked values without making a copy of the data. I need to look through the logic for the making again, my goal was to only run the masking function if a mask is set.

mikesmic avatar Apr 18 '24 11:04 mikesmic

As far as I'm aware, everything still seems to works as expected with a masked array.

It does incur an overhead (but much smaller by a factor 1000 than the previous method).

Do we need to change the logic to not use masked arrays for data that isn't masked? Currently a masked array is always generated.

rhysgt avatar Apr 18 '24 11:04 rhysgt

A masked array is only returned when a mask is provided. If unset, the normal map data is passed through as before.

rhysgt avatar Apr 24 '24 15:04 rhysgt