netcdf4-python
netcdf4-python copied to clipboard
Rounding in num2date
Hello,
I'm getting some rounding issues from netCDF4.netcdftime.utime.num2date
with the Gregorian calendar - it occasionally throws up some microseconds where none should be. Is this a known issue, perhaps a limitation of the algorithm?
In [1]: import netCDF4
In [2]: netCDF4.__version__
Out[2]: '1.2.4'
In [3]: u = netCDF4.netcdftime.utime('hours since 1999-12-1')
In [4]: u.num2date(1.0)
Out[4]: datetime.datetime(1999, 12, 1, 1, 0)
In [5]: u.num2date(2.0)
Out[5]: datetime.datetime(1999, 12, 1, 2, 0, 0, 6)
In [6]: u.num2date(3.0)
Out[6]: datetime.datetime(1999, 12, 1, 3, 0)
Also with days:
In [7]: u = netCDF4.netcdftime.utime('days since 1999-12-1')
In [8]: u.num2date(1./24)
Out[8]: datetime.datetime(1999, 12, 1, 1, 0)
In [9]: u.num2date(2./24)
Out[9]: datetime.datetime(1999, 12, 1, 2, 0, 0, 6)
In [10]: u.num2date(3./24)
Out[10]: datetime.datetime(1999, 12, 1, 3, 0)
Many thanks,
David
I believe that this is a known issues: the documentation string for netcdftime.utime
states:
Example usage:
>>> from netcdftime import utime
>>> from datetime import datetime
>>> cdftime = utime('hours since 0001-01-01 00:00:00')
>>> date = datetime.now()
>>> print date
2006-03-17 16:04:02.561678
>>>
>>> t = cdftime.date2num(date)
>>> print t
17577328.0672
>>>
>>> date = cdftime.num2date(t)
>>> print date
2006-03-17 16:04:02
>>>
The resolution of the transformation operation is approximately 0.1 seconds.
The documentation of date2num
and num2date
at http://unidata.github.io/netcdf4-python/
, should probably mention this too, though.
I think that documentation is out of date. The current date2num docs say
"Accuracy is somewhere between a millisecond and a microsecond"
and if I rerun the example in that docstring I get
2016-10-04 16:45:25.858372
17669824.7572
2016-10-04 16:45:25.858378
Pull request #591 updates the docstrings to state millisecond accuracy everywhere.
Hello,
Thank you for clarifying this.
It occurs to me that as the accuracy is 0.001 seconds, couldn't answers be rounded to the nearest millisecond? This would prevent downstream issues that arise when comparisons between two datetimes give the wrong answer due to incorrect due spurious microsecond values.
David
datetime uses seconds and microseconds, not millseconds.
If you use timedelta to compare two datetime instances, you can set the resolution to milliseconds timedelta(milliseconds=1)
.
Yes, but you could round to 1000 microseconds :)
I'm not sure what you mean by the timedelta. I'm thinking of operations like date1 < date2
where date[12] are datetime objects, e.g.
>>> from datetime import datetime
>>> datetime(2000, 1, 2, 0, 0, 0, 100007) > datetime(2000, 1, 2, 0, 0, 0, 100006)
True
If the last few microsecond digits are noise, we run into diffuculties.
disregard my timedelta comment. It's not immediately obvious to me how 1000 microsecond rounding could be implemented, but I'll give it some though.
That's great - thanks.
After thinking about this some more, I don't see how rounding to the nearest millisecond is going to help. You will still have spurious microseconds showing up in the datetime instances, just as in your example. Am I missing something?
I was thinking, somewhat niaively I suspect, of something along these lines in , e.g. DateFromJulianDay
: replacing
microsecond = microsecond.astype(np.int32)
with
microsecond = microsecond.astype(np.int32).round(-3)
if microsecond == 1000000:
second += 1
microsecond -= 1000000
# Uh oh - what if second is now 60 ... ?
So that the returned microseconds value is always one of 0, 1000, 2000..., 998000, 999000
But I see the difficulty in this approach of propagating the rounding up the the ladder datetime elements
I think that rounding is inevitably going to produce surprising results in some situations. Perhaps it's better to leave it up to the user to round the datetime instances (perhaps using some of the ideas here: http://stackoverflow.com/questions/3463930/how-to-round-the-minute-of-a-datetime-object-python)
For example,
import netCDF4
import datetime
u = netCDF4.netcdftime.utime('hours since 1999-12-1')
d = u.num2date(2.0)
print d
print d-datetime.timedelta(microseconds=d.microsecond) # microsecond floor
1999-12-01 02:00:00.000006
1999-12-01 02:00:00
Maybe related to this I get the following rounding error when the "since" string has a less-than-seconds resolution:
netCDF4.num2date(0., 'seconds since 2013-05-15T00:00:34.653020')
datetime.datetime(2013, 5, 15, 0, 0, 34)
netCDF4.num2date(0.3, 'seconds since 2013-05-15T00:00:34.653020')
datetime.datetime(2013, 5, 15, 0, 0, 34, 300000)
cftime does not currently support less-than-second resolution in the units string. If this is an important use case for you, please create an issue at https://github.com/Unidata/cftime/issues.