netcdf4-python
netcdf4-python copied to clipboard
What's the best way to prevent effect of `valid_min` et al.?
The fact that with v1.2.9 valid_min
etc may have severe effects whereever people have been sloppy with these attributes. (In my experience, there are a lot of these files.)
Is there an easy way to make sure the new standard behaviour is not taking effect? My understanding is that set_auto_maskandscale
is set at creation of the data set. Is that correct?
Note that I'm not advocating the sloppyness I described. But there's a lot of data we have to use without being able to make sure they comply to the standards.
(related to #703)
Pull request #708 (which will be in 1.3.0) tries to address this by not using valid_min/valid_max/missing_value/_FillValue if the values of those attributes cannot be safely cast to the variable dtype, and instead issuing a warning. Does this work for you? If so, I'll expedite a release. If not, let's discuss what else could be done.
Note that set_auto_mask
can be used at any time (not just at variable creation time) to disable the use of valid_min/valid_max/missing_value to return masked arrays.
Note that set_auto_mask can be used at any time (not just at variable creation time) to disable the use of valid_min/valid_max/missing_value to return masked arrays.
Thanks. I think this is what I was looking for.
Pull request #708 (which will be in 1.3.0) tries to address this by not using valid_min/valid_max/missing_value/_FillValue if the values of those attributes cannot be safely cast to the variable dtype, and instead issuing a warning. Does this work for you? If so, I'll expedite a release. If not, let's discuss what else could be done.
This definitely is a good thing. I'm more worried about cases where data sets suddenly stop working, though.
(Disclaimer: I'm definitely biased towards pessimism about this feature after spending a day debugging weird behaviour of numpy.ma
until I realised that someone thought it's a good idea to put valid_min=0
into a latitude field. :) )
missing_value
and _FillValue
were always cast to the variable dtype - in 1.2.9 valid_min
and valid_max
were implemented. The fact that this suddenly broke so many files for you suggests that folks have been mis-using valid_min
and valid_max
more often than missing_value
for some reason.
Note that set_auto_mask can be used at any time (not just at variable creation time) to disable the use of valid_min/valid_max/missing_value to return masked arrays.
Thanks. I think this is what I was looking for.
From a quick test with ASCAT wind data (which also seem to suffer from #703), it looks like Dataset.set_auto_mask(False)
also disables masking according the missing value attribute?
@jswhit: I'm quite busy atm, so I'll just pin netCDF4 to v1.2.8 in all my envs to be safe in the short run.
If you're interested, I'd be, however, definitely willing to contribute some regression tests and data sets suffering from either broken valid ranges or from #703 to more formally assess how to deal with these edge cases.
Yes, Dataset.set_auto_scale(False)
disables all masking. I have a regression test already for broken valid_range
.
We are also affected by the change in 1.2.9 concerning the automatic masking of values outside of the valid_range
. And I agree that the new behavior is the right way to handle these attributes, but, unfortunately, we have to deal with lots of files where these attributes are set wrongly. Using set_auto_mask
to false is not a good option for us, because this also disables the automatic conversion for the _FillValue
. To re-implement all the logic of handling missing values in the client code seems not a good idea, since it is already handled nicely in the library.
Is it therefore possible to provide an additional function to prevent the automatic conversion for the newly implemented valid_range
masking? I would also prefer if that would not be needed, but I cannot think of any elegant way of making all these files work again ... Thanks for considering!