arctic icon indicating copy to clipboard operation
arctic copied to clipboard

[TickStore] Issue with daylight saving

Open krywen opened this issue 7 years ago • 3 comments

Arctic Version

1.68.0

Arctic Store

TickStore

Platform and version

Ubuntu 14.04.5 LTS, python 2.7.6

Description of problem and/or code sample that reproduces the issue

Data written at a timestamp which falls within a daylight saving event, is not read back with the same timestamp.

import pandas as pd
import pytz
import logging
from arctic import Arctic, TICK_STORE
from arctic.date import DateRange
from datetime import datetime, date, timedelta


def main():
    mongo_client = Mongo()

    store = Arctic(mongo_client)
    lib_name = 'daylight'
    lib_type = TICK_STORE
    symbol = 'APPLESTOCKS'

    current_range = DateRange(datetime(2015, 10, 25, tzinfo=pytz.utc),
                              datetime.combine(date(2015, 10, 25), datetime.max.time()).replace(tzinfo=pytz.utc))

    store.initialize_library(lib_name, lib_type=lib_type)
    library = store[lib_name]

    index = [datetime(2015, 10, 25, tzinfo=pytz.utc)+i*timedelta(hours=0.5) for i in range(24)]
    df = pd.DataFrame(data={'data': [i for i in range(24)]}, index=index)
    print df
    library.delete(symbol, date_range=current_range)
    library.write(symbol, df)
    read_df = library.read(symbol, date_range=current_range)
    print read_df


if __name__ =='__main__':
    main()

This code prints

                             data
2015-10-25 00:00:00+00:00     0
2015-10-25 00:30:00+00:00     1
2015-10-25 01:00:00+00:00     2
2015-10-25 01:30:00+00:00     3
2015-10-25 02:00:00+00:00     4
2015-10-25 02:30:00+00:00     5
2015-10-25 03:00:00+00:00     6
2015-10-25 03:30:00+00:00     7
2015-10-25 04:00:00+00:00     8
2015-10-25 04:30:00+00:00     9
2015-10-25 05:00:00+00:00    10
2015-10-25 05:30:00+00:00    11
2015-10-25 06:00:00+00:00    12
2015-10-25 06:30:00+00:00    13
2015-10-25 07:00:00+00:00    14
2015-10-25 07:30:00+00:00    15
2015-10-25 08:00:00+00:00    16
2015-10-25 08:30:00+00:00    17
2015-10-25 09:00:00+00:00    18
2015-10-25 09:30:00+00:00    19
2015-10-25 10:00:00+00:00    20
2015-10-25 10:30:00+00:00    21
2015-10-25 11:00:00+00:00    22
2015-10-25 11:30:00+00:00    23

                           data
2015-10-25 01:00:00+00:00   0.0
2015-10-25 01:30:00+00:00   1.0
2015-10-25 01:00:00+00:00   2.0
2015-10-25 01:30:00+00:00   3.0
2015-10-25 02:00:00+00:00   4.0
2015-10-25 02:30:00+00:00   5.0
2015-10-25 03:00:00+00:00   6.0
2015-10-25 03:30:00+00:00   7.0
2015-10-25 04:00:00+00:00   8.0
2015-10-25 04:30:00+00:00   9.0
2015-10-25 05:00:00+00:00  10.0
2015-10-25 05:30:00+00:00  11.0
2015-10-25 06:00:00+00:00  12.0
2015-10-25 06:30:00+00:00  13.0
2015-10-25 07:00:00+00:00  14.0
2015-10-25 07:30:00+00:00  15.0
2015-10-25 08:00:00+00:00  16.0
2015-10-25 08:30:00+00:00  17.0
2015-10-25 09:00:00+00:00  18.0
2015-10-25 09:30:00+00:00  19.0
2015-10-25 10:00:00+00:00  20.0
2015-10-25 10:30:00+00:00  21.0
2015-10-25 11:00:00+00:00  22.0
2015-10-25 11:30:00+00:00  23.0

I would have expected the second dataframe to have the same timestamps as the first one. Please note that 2015-10-25 a daylight saving event happened (clock went back from 01:00 to 00:00) .

My local timezone is UK/London if that is relevant.

A similar issue happen on (2015, 3, 29) when the clock when Daylight Saving Time started. The output of the previous code with date (2015, 3, 29) is:

                           data
2015-03-29 00:00:00+00:00     0
2015-03-29 00:30:00+00:00     1
2015-03-29 01:00:00+00:00     2
2015-03-29 01:30:00+00:00     3
2015-03-29 02:00:00+00:00     4
2015-03-29 02:30:00+00:00     5
2015-03-29 03:00:00+00:00     6
2015-03-29 03:30:00+00:00     7
2015-03-29 04:00:00+00:00     8
2015-03-29 04:30:00+00:00     9
2015-03-29 05:00:00+00:00    10
2015-03-29 05:30:00+00:00    11
2015-03-29 06:00:00+00:00    12
2015-03-29 06:30:00+00:00    13
2015-03-29 07:00:00+00:00    14
2015-03-29 07:30:00+00:00    15
2015-03-29 08:00:00+00:00    16
2015-03-29 08:30:00+00:00    17
2015-03-29 09:00:00+00:00    18
2015-03-29 09:30:00+00:00    19
2015-03-29 10:00:00+00:00    20
2015-03-29 10:30:00+00:00    21
2015-03-29 11:00:00+00:00    22
2015-03-29 11:30:00+00:00    23
                           data
2015-03-29 00:00:00+00:00   0.0
2015-03-29 00:30:00+00:00   1.0
2015-03-29 02:00:00+01:00   2.0
2015-03-29 02:30:00+01:00   3.0
2015-03-29 03:00:00+01:00   4.0
2015-03-29 03:30:00+01:00   5.0
2015-03-29 04:00:00+01:00   6.0
2015-03-29 04:30:00+01:00   7.0
2015-03-29 05:00:00+01:00   8.0
2015-03-29 05:30:00+01:00   9.0
2015-03-29 06:00:00+01:00  10.0
2015-03-29 06:30:00+01:00  11.0
2015-03-29 07:00:00+01:00  12.0
2015-03-29 07:30:00+01:00  13.0
2015-03-29 08:00:00+01:00  14.0
2015-03-29 08:30:00+01:00  15.0
2015-03-29 09:00:00+01:00  16.0
2015-03-29 09:30:00+01:00  17.0
2015-03-29 10:00:00+01:00  18.0
2015-03-29 10:30:00+01:00  19.0
2015-03-29 11:00:00+01:00  20.0
2015-03-29 11:30:00+01:00  21.0
2015-03-29 12:00:00+01:00  22.0
2015-03-29 12:30:00+01:00  23.0

Which seems correct to me, however is returned with a different timezone. Is this expected?

krywen avatar Oct 08 '18 16:10 krywen

I'm not sure what the correct thing would be. You're using UTC to read out the results no? And UTC itself is not affected by DST, and tickstore returns data back to you in your current timezone:

# Present data in the user's default TimeZone
rtn.index = rtn.index.tz_convert(mktz())

@jamesblackburn ?

bmoscon avatar Oct 13 '18 22:10 bmoscon

This is just a printing issue, converting back to utc reproduce the right timestamp. read_df.index = read_df.index.tz_convert(pytz.utc)

In my personal opinion Tickstore should only return UTC timezones, but in this specific case, DateTimeIndexArray seems to convert well from and to different timezones without hiccups like https://github.com/manahl/arctic/issues/638

krywen avatar Oct 15 '18 14:10 krywen

@bmoscon @krywen The problem appears to happen in those lines in tickstore.py

Present data in the user's default TimeZone

rtn.index = rtn.index.tz_convert(mktz())

The tzfile returned by mktz() does something evil which I think puts the first 2 pandas Timestamp instances in the index into an inconsistent state, while the raw datetime64 object itself is correct, there is something else about them that isn't.

If I change that line to

import pytz;rtn.index = rtn.index.tz_convert(pytz.timezone('Europe/London'))

and then print the dataframe I get the correct result:

2015-10-25 01:00:00+01:00 0.0 2015-10-25 01:30:00+01:00 1.0 2015-10-25 01:00:00+00:00 2.0 2015-10-25 01:30:00+00:00 3.0 2015-10-25 02:00:00+00:00 4.0 2015-10-25 02:30:00+00:00 5.0 2015-10-25 03:00:00+00:00 6.0 2015-10-25 03:30:00+00:00 7.0 2015-10-25 04:00:00+00:00 8.0 2015-10-25 04:30:00+00:00 9.0 2015-10-25 05:00:00+00:00 10.0 2015-10-25 05:30:00+00:00 11.0 2015-10-25 06:00:00+00:00 12.0 2015-10-25 06:30:00+00:00 13.0 2015-10-25 07:00:00+00:00 14.0 2015-10-25 07:30:00+00:00 15.0 2015-10-25 08:00:00+00:00 16.0 2015-10-25 08:30:00+00:00 17.0 2015-10-25 09:00:00+00:00 18.0 2015-10-25 09:30:00+00:00 19.0 2015-10-25 10:00:00+00:00 20.0 2015-10-25 10:30:00+00:00 21.0 2015-10-25 11:00:00+00:00 22.0 2015-10-25 11:30:00+00:00 23.0

kchaliki avatar Oct 15 '18 18:10 kchaliki