[TickStore] Issue with daylight saving
Arctic Version
1.68.0
Arctic Store
TickStore
Platform and version
Ubuntu 14.04.5 LTS, python 2.7.6
Description of problem and/or code sample that reproduces the issue
Data written at a timestamp which falls within a daylight saving event, is not read back with the same timestamp.
import pandas as pd
import pytz
import logging
from arctic import Arctic, TICK_STORE
from arctic.date import DateRange
from datetime import datetime, date, timedelta
def main():
mongo_client = Mongo()
store = Arctic(mongo_client)
lib_name = 'daylight'
lib_type = TICK_STORE
symbol = 'APPLESTOCKS'
current_range = DateRange(datetime(2015, 10, 25, tzinfo=pytz.utc),
datetime.combine(date(2015, 10, 25), datetime.max.time()).replace(tzinfo=pytz.utc))
store.initialize_library(lib_name, lib_type=lib_type)
library = store[lib_name]
index = [datetime(2015, 10, 25, tzinfo=pytz.utc)+i*timedelta(hours=0.5) for i in range(24)]
df = pd.DataFrame(data={'data': [i for i in range(24)]}, index=index)
print df
library.delete(symbol, date_range=current_range)
library.write(symbol, df)
read_df = library.read(symbol, date_range=current_range)
print read_df
if __name__ =='__main__':
main()
This code prints
data
2015-10-25 00:00:00+00:00 0
2015-10-25 00:30:00+00:00 1
2015-10-25 01:00:00+00:00 2
2015-10-25 01:30:00+00:00 3
2015-10-25 02:00:00+00:00 4
2015-10-25 02:30:00+00:00 5
2015-10-25 03:00:00+00:00 6
2015-10-25 03:30:00+00:00 7
2015-10-25 04:00:00+00:00 8
2015-10-25 04:30:00+00:00 9
2015-10-25 05:00:00+00:00 10
2015-10-25 05:30:00+00:00 11
2015-10-25 06:00:00+00:00 12
2015-10-25 06:30:00+00:00 13
2015-10-25 07:00:00+00:00 14
2015-10-25 07:30:00+00:00 15
2015-10-25 08:00:00+00:00 16
2015-10-25 08:30:00+00:00 17
2015-10-25 09:00:00+00:00 18
2015-10-25 09:30:00+00:00 19
2015-10-25 10:00:00+00:00 20
2015-10-25 10:30:00+00:00 21
2015-10-25 11:00:00+00:00 22
2015-10-25 11:30:00+00:00 23
data
2015-10-25 01:00:00+00:00 0.0
2015-10-25 01:30:00+00:00 1.0
2015-10-25 01:00:00+00:00 2.0
2015-10-25 01:30:00+00:00 3.0
2015-10-25 02:00:00+00:00 4.0
2015-10-25 02:30:00+00:00 5.0
2015-10-25 03:00:00+00:00 6.0
2015-10-25 03:30:00+00:00 7.0
2015-10-25 04:00:00+00:00 8.0
2015-10-25 04:30:00+00:00 9.0
2015-10-25 05:00:00+00:00 10.0
2015-10-25 05:30:00+00:00 11.0
2015-10-25 06:00:00+00:00 12.0
2015-10-25 06:30:00+00:00 13.0
2015-10-25 07:00:00+00:00 14.0
2015-10-25 07:30:00+00:00 15.0
2015-10-25 08:00:00+00:00 16.0
2015-10-25 08:30:00+00:00 17.0
2015-10-25 09:00:00+00:00 18.0
2015-10-25 09:30:00+00:00 19.0
2015-10-25 10:00:00+00:00 20.0
2015-10-25 10:30:00+00:00 21.0
2015-10-25 11:00:00+00:00 22.0
2015-10-25 11:30:00+00:00 23.0
I would have expected the second dataframe to have the same timestamps as the first one. Please note that 2015-10-25 a daylight saving event happened (clock went back from 01:00 to 00:00) .
My local timezone is UK/London if that is relevant.
A similar issue happen on (2015, 3, 29) when the clock when Daylight Saving Time started. The output of the previous code with date (2015, 3, 29) is:
data
2015-03-29 00:00:00+00:00 0
2015-03-29 00:30:00+00:00 1
2015-03-29 01:00:00+00:00 2
2015-03-29 01:30:00+00:00 3
2015-03-29 02:00:00+00:00 4
2015-03-29 02:30:00+00:00 5
2015-03-29 03:00:00+00:00 6
2015-03-29 03:30:00+00:00 7
2015-03-29 04:00:00+00:00 8
2015-03-29 04:30:00+00:00 9
2015-03-29 05:00:00+00:00 10
2015-03-29 05:30:00+00:00 11
2015-03-29 06:00:00+00:00 12
2015-03-29 06:30:00+00:00 13
2015-03-29 07:00:00+00:00 14
2015-03-29 07:30:00+00:00 15
2015-03-29 08:00:00+00:00 16
2015-03-29 08:30:00+00:00 17
2015-03-29 09:00:00+00:00 18
2015-03-29 09:30:00+00:00 19
2015-03-29 10:00:00+00:00 20
2015-03-29 10:30:00+00:00 21
2015-03-29 11:00:00+00:00 22
2015-03-29 11:30:00+00:00 23
data
2015-03-29 00:00:00+00:00 0.0
2015-03-29 00:30:00+00:00 1.0
2015-03-29 02:00:00+01:00 2.0
2015-03-29 02:30:00+01:00 3.0
2015-03-29 03:00:00+01:00 4.0
2015-03-29 03:30:00+01:00 5.0
2015-03-29 04:00:00+01:00 6.0
2015-03-29 04:30:00+01:00 7.0
2015-03-29 05:00:00+01:00 8.0
2015-03-29 05:30:00+01:00 9.0
2015-03-29 06:00:00+01:00 10.0
2015-03-29 06:30:00+01:00 11.0
2015-03-29 07:00:00+01:00 12.0
2015-03-29 07:30:00+01:00 13.0
2015-03-29 08:00:00+01:00 14.0
2015-03-29 08:30:00+01:00 15.0
2015-03-29 09:00:00+01:00 16.0
2015-03-29 09:30:00+01:00 17.0
2015-03-29 10:00:00+01:00 18.0
2015-03-29 10:30:00+01:00 19.0
2015-03-29 11:00:00+01:00 20.0
2015-03-29 11:30:00+01:00 21.0
2015-03-29 12:00:00+01:00 22.0
2015-03-29 12:30:00+01:00 23.0
Which seems correct to me, however is returned with a different timezone. Is this expected?
I'm not sure what the correct thing would be. You're using UTC to read out the results no? And UTC itself is not affected by DST, and tickstore returns data back to you in your current timezone:
# Present data in the user's default TimeZone
rtn.index = rtn.index.tz_convert(mktz())
@jamesblackburn ?
This is just a printing issue, converting back to utc reproduce the right timestamp.
read_df.index = read_df.index.tz_convert(pytz.utc)
In my personal opinion Tickstore should only return UTC timezones, but in this specific case, DateTimeIndexArray seems to convert well from and to different timezones without hiccups like https://github.com/manahl/arctic/issues/638
@bmoscon @krywen The problem appears to happen in those lines in tickstore.py
Present data in the user's default TimeZone
rtn.index = rtn.index.tz_convert(mktz())
The tzfile returned by mktz() does something evil which I think puts the first 2 pandas Timestamp instances in the index into an inconsistent state, while the raw datetime64 object itself is correct, there is something else about them that isn't.
If I change that line to
import pytz;rtn.index = rtn.index.tz_convert(pytz.timezone('Europe/London'))
and then print the dataframe I get the correct result:
2015-10-25 01:00:00+01:00 0.0 2015-10-25 01:30:00+01:00 1.0 2015-10-25 01:00:00+00:00 2.0 2015-10-25 01:30:00+00:00 3.0 2015-10-25 02:00:00+00:00 4.0 2015-10-25 02:30:00+00:00 5.0 2015-10-25 03:00:00+00:00 6.0 2015-10-25 03:30:00+00:00 7.0 2015-10-25 04:00:00+00:00 8.0 2015-10-25 04:30:00+00:00 9.0 2015-10-25 05:00:00+00:00 10.0 2015-10-25 05:30:00+00:00 11.0 2015-10-25 06:00:00+00:00 12.0 2015-10-25 06:30:00+00:00 13.0 2015-10-25 07:00:00+00:00 14.0 2015-10-25 07:30:00+00:00 15.0 2015-10-25 08:00:00+00:00 16.0 2015-10-25 08:30:00+00:00 17.0 2015-10-25 09:00:00+00:00 18.0 2015-10-25 09:30:00+00:00 19.0 2015-10-25 10:00:00+00:00 20.0 2015-10-25 10:30:00+00:00 21.0 2015-10-25 11:00:00+00:00 22.0 2015-10-25 11:30:00+00:00 23.0