[BUG] Datetime with timezone=UTC breaks date labels
Hello!
If dates in pandas DataFrame contain timezone and look like Timestamp('2019-12-16 07:03:00+0000', tz='UTC'), then according labels will be near 1970 year, because t in the following lambda comes in microseconds instead of nanoseconds, but it still divides by 1.0e-9.
def _x2local_t(datasrc, x):
if display_timezone == None:
return _x2utc(datasrc, x)
return _x2t(datasrc, x, lambda t: _millisecond_tz_wrap(datetime.fromtimestamp(t/1e9, tz=display_timezone).strftime(timestamp_format)))
But if I force delete the timezone records, then everything becomes ok.
df['date'] = df['date'].apply(lambda x: x.replace(tzinfo=None))
Interesting, I'll look into it, but unfortunately I'm very short on time at the moment. Might be able to squeeze it in the next couple of days.
Looks like I'll have to keep track of the timestamp datatype before converting, if done at all, and otherwise try to infer it from the whole duration. If you could give me a small and complete example duplicating the bug, I'd be much obliged!
Looks like I'll have to keep track of the timestamp datatype before converting, if done at all, and otherwise try to infer it from the whole duration. If you could give me a small and complete example duplicating the bug, I'd be much obliged!
I think I've figured out what the problem is. DataFrame has default resolution in nanoseconds. When you call
df['date'] = pd.to_datetime(df['date']) it always converts to Datetime64[ns, ...]. I get dataframes from DuckDB requests, in my case it returns dates with type Datetime64[us, ...].
Example:
import finplot as fplt
import pandas as pd
df = pd.DataFrame(data={
"date": [
"2024-10-01 21:00:00+00:03",
"2024-10-02 21:00:00+00:03",
"2024-10-03 21:00:00+00:03",
"2024-10-04 21:00:00+00:03",
"2024-10-05 21:00:00+00:03"],
"open": [136.50, 132.72, 133.52, 133.75, 133.49],
"high": [137.16, 134.16, 134.72, 133.75, 133.62],
"low": [131.70, 130.32, 132.03, 133.00, 132.57],
"close": [132.12, 133.87, 133.32, 133.02, 133.33],
"volume": [6701453, 5758141, 2886637, 31952, 24598],
})
df['date'] = pd.to_datetime(df['date']) #default resolution is 'ns'
print(df['date'])
df['date'] = df['date'].astype("datetime64[us, UTC+00:03]") #comment this to fix dates on charts
print(df['date'])
ac, av = fplt.create_plot("candles chart", rows=2, maximize=False)
fplt.candlestick_ochl(df[['date','open', 'close', 'high', 'low']], ax=ac)
fplt.volume_ocv(df[['date','open', 'close', 'volume']], ax=av)
fplt.show()
Hm. I changed something related to this some months ago, but got sidetracked. Please check out 6060590f to see if that helps with this in any way (or makes it worse). That is what I'm getting when I'm trying your code (without the astype conversion):
This is also what I'm expecting, as the data points are 24 hrs apart.
Yes, https://github.com/highfestiva/finplot/commit/6060590fb697ccf4938532a84ba4d5ab94ca5008 fixes the issue, thanks!