yahooquery
yahooquery copied to clipboard
[Question] Same code returns different results in different environments
Hi, thanks again for the cool library. I'm using it in two different environments (Win and Linux), same yahooquery version (2.2.8), same Python version (2.8.5). This code returns different results:
t=Ticker('AAPL').history(period='1d',start=date,end=date)
print(type(t))
On Win environment it returns <class 'pandas.core.frame.DataFrame'>
, and that's what I expect.
However, in Linux it returns <class 'dict'>
.
The value of t on Linux is:
{'AAPL': {'meta': {'currency': 'USD', 'symbol': 'AAPL', 'exchangeName': 'NMS', 'instrumentType': 'EQUITY', 'firstTradeDate': 345479400, 'regularMarketTime': 1605819601, 'gmtoffset': -18000, 'timezone': 'EST', 'exchangeTimezoneName': 'America/New_York', 'regularMarketPrice': 118.64, 'chartPreviousClose': 118.03, 'priceHint': 2, 'currentTradingPeriod': {'pre': {'timezone': 'EST', 'start': 1605776400, 'end': 1605796200, 'gmtoffset': -18000}, 'regular': {'timezone': 'EST', 'start': 1605796200, 'end': 1605819600, 'gmtoffset': -18000}, 'post': {'timezone': 'EST', 'start': 1605819600, 'end': 1605834000, 'gmtoffset': -18000}}, 'dataGranularity': '1d', 'range': '', 'validRanges': ['1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y', '10y', 'ytd', 'max']}, 'indicators': {'quote': [{}], 'adjclose': [{}]}}}
Could you please advise on what's wrong here?
What are the date arguments that you’re using? The reason the dictionary is returned instead of a dataframe is because there was no data returned. The data returned from that endpoint will be in the indicators key, which you can see from what you pasted above is empty.
@dpguthrie I initially thought that could be the reason, but I checked and the date is the same: 2020-11-13. Just ran the same test, same result, printed the date on both computers, both show 2020-11-13. Can it be somehow related to the timezone setting? The computers are in different time zones (but the difference is not big, so it's the same date on both at the moment).
@midnightdim I'm unable to replicate your first scenario - actually receiving a dataframe with the parameters you're using. I'm on windows, python 3.7.9
date = '2020-11-13'
Ticker('aapl').history(start=date, end=date, period='1d')
Using those parameters, I'm receiving the same dictionary you have above. The url that's generated with those parameters is the following: https://query2.finance.yahoo.com/v8/finance/chart/aapl?period1=1605250800&period2=1605250800&interval=1d&events=div%2Csplit&formatted=false&lang=en-US®ion=US&corsDomain=finance.yahoo.com
I do think though that you've found some problems with the way the logic is structured in that method.
I made the assumption that if the start
and end
dates are specified, the period
argument is no longer needed - so it's not actually being used as a query parameter in the request. This should change. Because when that argument is included, the appropriate data is returned.
@dpguthrie It's interesting. I'm still receiving this dataframe.
<class 'pandas.core.frame.DataFrame'>
close open high volume low adjclose
symbol date
AAPL 2020-11-12 119.209999 119.620003 120.529999 103162300 118.57 119.209999
How can I check which URL is generated?
You can put a print statement in the _get_data method of the _YahooFinance class. After urls just put:
def _get_data(self, key, params={}, **kwargs):
config = self._CONFIG[key]
params = self._construct_params(config, params)
urls = self._construct_urls(config, params, **kwargs)
print([r.url for r in urls])
@dpguthrie Added it. This is the code:
print(date)
t = Ticker('AAPL').history(period='1d',start=date,end=date)
print(type(t))
Here's what I got on the box where it works fine:
2020-12-30
['https://query2.finance.yahoo.com/v8/finance/chart/AAPL?period1=1609261200&period2=1609261200&interval=1d&events=div%2Csplit&formatted=false&lang=en-US®ion=US&corsDomain=finance.yahoo.com']
<class 'pandas.core.frame.DataFrame'>
Here's what I got on the box where I need it to work (but it doesn't):
2020-12-30
['https://query2.finance.yahoo.com/v8/finance/chart/AAPL?period1=1609286400&period2=1609286400&interval=1d&events=div%2Csplit&formatted=false&lang=en-US®ion=US&corsDomain=finance.yahoo.com']
<class 'dict'>
dict_keys(['meta', 'indicators'])
The period values are different.
I thought that the problem was caused by the timezone difference, so I switched to using UTC based dates. Now they are exactly the same on two machines, but the code still generates different URLs. I think the problem is in _convert_to_timestamp function - it seems to be timezone unaware. As a workaround I'm just adding some timedelta to the date to compensate the timezone difference.
Just one thing to add to this. When I also print the result, I have this on my local box:
2020-12-30
['https://query2.finance.yahoo.com/v8/finance/chart/AAPL?period1=1609261200&period2=1609261200&interval=1d&events=div%2Csplit&formatted=false&lang=en-US®ion=US&corsDomain=finance.yahoo.com']
<class 'pandas.core.frame.DataFrame'>
high low open volume close adjclose
symbol date
AAPL 2020-12-29 138.789993 134.339996 138.050003 121047300 134.869995 134.869995
The local date is 2020-12-30 for me, but as you can see, it's 2020-12-29 when it's converted (to GMT?).
Even if I pass it as a string ('2020-12-30') it gets converted to 2020-12-29.
This is the expected result for me, but it's something to consider. Maybe history
function should also accept timestamps so that the user can prepare it him/herself and put them there.
The problem is definitely in the _convert_to_timestamp
function. But, as far as the history
method is concerned - you can pass a datetime.datetime
object for the start and end parameters.
But, as far as the history method is concerned - you can pass a datetime.datetime object for the start and end parameters.
That's exactly what I did in my first examples. I basically took datetime.datetime.today()
and subtracted a number of days to get the price from the past. It doesn't matter if I use today minus N days or pass that date as a string - the results are exactly the same (and they differ on computers with different timezones).
Also, that's what I'm using now in this workaround - I send a datetime delta of several hours to compensate the difference.
The problem is - even if I set the date to, say, 12/30/2020 00:00 UTC, _convert_to_timestamp
converts it to timestamp according to the current timezone, and it yields different values in different timezones.
+1 I'm running into the same issue and just spent hours trying to find it. Caused by running in different time zones.
@maxwhoppa @midnightdim like the fellow gents mentioned above, I observed similar behavior when switching my local machine time from PST to HKT, with the same code shown below. All I'm change is the timezone in my computer (e.g. doing it in Control Panel on a windows machine). I had thought by pinning the start and end date to EST and setting adj_timezone=True (this supposing adjust the time to the local time for the AAPL) would be me the correct result.
Expected behaviour: local EST for AAPL
` from data_util import convert_ticker_list_to_symbols_string symbols = convert_ticker_list_to_symbols_string(['AAPL']) tickers_obj = Ticker(symbols, asynchronous=True, adj_ohlc=True, progress=True, validate=True, timeout=999999)
NYC = tz.gettz('America/New_York')
sdt = datetime.datetime(2020,1,1,tzinfo=NYC)
edt = datetime.datetime(2020,1,11,tzinfo=NYC)
df = tickers_obj.history(start=sdt, end=edt,interval='1h',adj_timezone=True)
` For Pacific Time I got
high close ... open low
symbol date ...
AAPL 2020-01-02 10:30:00 298.410004 297.540009 ... 296.239990 295.190002
2020-01-02 11:30:00 298.750000 298.109985 ... 297.540009 297.109985
2020-01-02 12:30:00 298.799988 298.285004 ... 298.100006 297.450012
For HKT I got
volume open ... high low
symbol date ...
AAPL 2019-12-31 11:30:00 0.0 291.219910 ... 292.260010 290.779999
2019-12-31 12:30:00 2082131.0 291.850006 ... 292.269989 291.399994
2019-12-31 13:30:00 1725654.0 292.279999 ... 292.470001 291.880005
2019-12-31 14:30:00 1403135.0 292.089996 ... 292.299988 291.989990
2019-12-31 15:30:00 3089978.0 292.266205 ... 293.260010 292.130005
2019-12-31 16:30:00 3683862.0 293.180115 ... 293.679993 292.920013
2020-01-02 10:30:00 9941640.0 296.239990 ... 298.410004 295.190002
2020-01-02 11:30:00 4795404.0 297.540009 ... 298.750000 297.109985
2020-01-02 12:30:00 3345181.0 298.100006 ... 298.799988 297.450012
I experimented with mktime in _convert_to_timestamp (line 1228 ticker.py) where it was used to parse the datetime to timestamp. It turns out from this link that mktime assumes input is local time so regardless of the starttime's timezone once it passes thru mktime it converted to local time.
One can run the code below and try changing the timezone to observe the difference. Perhaps a potential solution is to consider the following. @dpguthrie I'm not familar with how to contribute open source project, but would love contribute
`
- convert start-time (either local or timezone aware) to utc
- _convert_to_timestamp convert starttime to timestamp with calendar.timegm (same applies for end time)
results generated from code below
CURRENT TIME HKT MK TIME 1577808000 GM TIME 1577836800
(now switching timezone from windows control panel)
CURRENT TIME PST MK TIME 1577865600 GM TIME 1577836800
In both runs the GM TIME remains static whereas the MK TIME varies when I change the timezone
` import time import calendar
here we pin sdt to UTC
sdt = datetime.datetime(2020,1,1,tzinfo=tz.UTC) date_mktime = int(time.mktime(sdt.timetuple())) date_gmtime = int(calendar.timegm(sdt.timetuple()))
print("CURRENT TIME PST") print("MK TIME ", date_mktime ) print("timegm ",date_gmtime )
`