netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Rounding in num2date

Open davidhassell opened this issue 8 years ago • 14 comments

Hello,

I'm getting some rounding issues from netCDF4.netcdftime.utime.num2date with the Gregorian calendar - it occasionally throws up some microseconds where none should be. Is this a known issue, perhaps a limitation of the algorithm?

In [1]: import netCDF4

In [2]: netCDF4.__version__
Out[2]: '1.2.4'

In [3]: u = netCDF4.netcdftime.utime('hours since 1999-12-1')

In [4]: u.num2date(1.0)
Out[4]: datetime.datetime(1999, 12, 1, 1, 0)

In [5]: u.num2date(2.0)
Out[5]: datetime.datetime(1999, 12, 1, 2, 0, 0, 6)

In [6]: u.num2date(3.0)
Out[6]: datetime.datetime(1999, 12, 1, 3, 0)

Also with days:

In [7]: u = netCDF4.netcdftime.utime('days since 1999-12-1')

In [8]: u.num2date(1./24)
Out[8]: datetime.datetime(1999, 12, 1, 1, 0)

In [9]: u.num2date(2./24)
Out[9]: datetime.datetime(1999, 12, 1, 2, 0, 0, 6)

In [10]: u.num2date(3./24)
Out[10]: datetime.datetime(1999, 12, 1, 3, 0)

Many thanks,

David

davidhassell avatar Oct 04 '16 10:10 davidhassell

I believe that this is a known issues: the documentation string for netcdftime.utime states:

Example usage:

>>> from netcdftime import utime
>>> from datetime import  datetime
>>> cdftime = utime('hours since 0001-01-01 00:00:00')
>>> date = datetime.now()
>>> print date
2006-03-17 16:04:02.561678
>>>
>>> t = cdftime.date2num(date)
>>> print t
17577328.0672
>>>
>>> date = cdftime.num2date(t)
>>> print date
2006-03-17 16:04:02
>>>

The resolution of the transformation operation is approximately 0.1 seconds.

The documentation of date2num and num2date at http://unidata.github.io/netcdf4-python/, should probably mention this too, though.

ckhroulev avatar Oct 04 '16 15:10 ckhroulev

I think that documentation is out of date. The current date2num docs say

"Accuracy is somewhere between a millisecond and a microsecond"

and if I rerun the example in that docstring I get

2016-10-04 16:45:25.858372
17669824.7572
2016-10-04 16:45:25.858378

jswhit avatar Oct 04 '16 20:10 jswhit

Pull request #591 updates the docstrings to state millisecond accuracy everywhere.

jswhit avatar Oct 05 '16 12:10 jswhit

Hello,

Thank you for clarifying this.

It occurs to me that as the accuracy is 0.001 seconds, couldn't answers be rounded to the nearest millisecond? This would prevent downstream issues that arise when comparisons between two datetimes give the wrong answer due to incorrect due spurious microsecond values.

David

davidhassell avatar Oct 05 '16 13:10 davidhassell

datetime uses seconds and microseconds, not millseconds.

jswhit avatar Oct 05 '16 14:10 jswhit

If you use timedelta to compare two datetime instances, you can set the resolution to milliseconds timedelta(milliseconds=1).

jswhit avatar Oct 05 '16 14:10 jswhit

Yes, but you could round to 1000 microseconds :)

I'm not sure what you mean by the timedelta. I'm thinking of operations like date1 < date2 where date[12] are datetime objects, e.g.

>>> from datetime import datetime
>>> datetime(2000, 1, 2, 0, 0, 0, 100007) > datetime(2000, 1, 2, 0, 0, 0, 100006)
True

If the last few microsecond digits are noise, we run into diffuculties.

davidhassell avatar Oct 05 '16 15:10 davidhassell

disregard my timedelta comment. It's not immediately obvious to me how 1000 microsecond rounding could be implemented, but I'll give it some though.

jswhit avatar Oct 05 '16 16:10 jswhit

That's great - thanks.

davidhassell avatar Oct 05 '16 21:10 davidhassell

After thinking about this some more, I don't see how rounding to the nearest millisecond is going to help. You will still have spurious microseconds showing up in the datetime instances, just as in your example. Am I missing something?

jswhit avatar Oct 06 '16 12:10 jswhit

I was thinking, somewhat niaively I suspect, of something along these lines in , e.g. DateFromJulianDay: replacing

microsecond = microsecond.astype(np.int32)

with

microsecond = microsecond.astype(np.int32).round(-3)
if microsecond == 1000000:
   second += 1
   microsecond -= 1000000
   # Uh oh - what if second is now 60 ... ?

So that the returned microseconds value is always one of 0, 1000, 2000..., 998000, 999000

But I see the difficulty in this approach of propagating the rounding up the the ladder datetime elements

davidhassell avatar Oct 06 '16 13:10 davidhassell

I think that rounding is inevitably going to produce surprising results in some situations. Perhaps it's better to leave it up to the user to round the datetime instances (perhaps using some of the ideas here: http://stackoverflow.com/questions/3463930/how-to-round-the-minute-of-a-datetime-object-python)

For example,

import netCDF4
import datetime
u = netCDF4.netcdftime.utime('hours since 1999-12-1')
d = u.num2date(2.0)
print d
print d-datetime.timedelta(microseconds=d.microsecond) # microsecond floor

1999-12-01 02:00:00.000006
1999-12-01 02:00:00

jswhit avatar Oct 14 '16 18:10 jswhit

Maybe related to this I get the following rounding error when the "since" string has a less-than-seconds resolution:

netCDF4.num2date(0., 'seconds since 2013-05-15T00:00:34.653020')
datetime.datetime(2013, 5, 15, 0, 0, 34)

netCDF4.num2date(0.3, 'seconds since 2013-05-15T00:00:34.653020')
datetime.datetime(2013, 5, 15, 0, 0, 34, 300000)

floogit avatar Dec 27 '18 15:12 floogit

cftime does not currently support less-than-second resolution in the units string. If this is an important use case for you, please create an issue at https://github.com/Unidata/cftime/issues.

jswhit avatar Dec 27 '18 17:12 jswhit