yahooquery icon indicating copy to clipboard operation
yahooquery copied to clipboard

[Question] Same code returns different results in different environments

Open midnightdim opened this issue 3 years ago • 14 comments

Hi, thanks again for the cool library. I'm using it in two different environments (Win and Linux), same yahooquery version (2.2.8), same Python version (2.8.5). This code returns different results:

t=Ticker('AAPL').history(period='1d',start=date,end=date)
print(type(t))

On Win environment it returns <class 'pandas.core.frame.DataFrame'>, and that's what I expect. However, in Linux it returns <class 'dict'>. The value of t on Linux is:

{'AAPL': {'meta': {'currency': 'USD', 'symbol': 'AAPL', 'exchangeName': 'NMS', 'instrumentType': 'EQUITY', 'firstTradeDate': 345479400, 'regularMarketTime': 1605819601, 'gmtoffset': -18000, 'timezone': 'EST', 'exchangeTimezoneName': 'America/New_York', 'regularMarketPrice': 118.64, 'chartPreviousClose': 118.03, 'priceHint': 2, 'currentTradingPeriod': {'pre': {'timezone': 'EST', 'start': 1605776400, 'end': 1605796200, 'gmtoffset': -18000}, 'regular': {'timezone': 'EST', 'start': 1605796200, 'end': 1605819600, 'gmtoffset': -18000}, 'post': {'timezone': 'EST', 'start': 1605819600, 'end': 1605834000, 'gmtoffset': -18000}}, 'dataGranularity': '1d', 'range': '', 'validRanges': ['1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y', '10y', 'ytd', 'max']}, 'indicators': {'quote': [{}], 'adjclose': [{}]}}}

Could you please advise on what's wrong here?

midnightdim avatar Nov 20 '20 05:11 midnightdim

What are the date arguments that you’re using? The reason the dictionary is returned instead of a dataframe is because there was no data returned. The data returned from that endpoint will be in the indicators key, which you can see from what you pasted above is empty.

dpguthrie avatar Nov 20 '20 14:11 dpguthrie

@dpguthrie I initially thought that could be the reason, but I checked and the date is the same: 2020-11-13. Just ran the same test, same result, printed the date on both computers, both show 2020-11-13. Can it be somehow related to the timezone setting? The computers are in different time zones (but the difference is not big, so it's the same date on both at the moment).

midnightdim avatar Nov 20 '20 15:11 midnightdim

@midnightdim I'm unable to replicate your first scenario - actually receiving a dataframe with the parameters you're using. I'm on windows, python 3.7.9

date = '2020-11-13'
Ticker('aapl').history(start=date, end=date, period='1d')

Using those parameters, I'm receiving the same dictionary you have above. The url that's generated with those parameters is the following: https://query2.finance.yahoo.com/v8/finance/chart/aapl?period1=1605250800&period2=1605250800&interval=1d&events=div%2Csplit&formatted=false&lang=en-US&region=US&corsDomain=finance.yahoo.com

dpguthrie avatar Nov 20 '20 16:11 dpguthrie

I do think though that you've found some problems with the way the logic is structured in that method.

I made the assumption that if the start and end dates are specified, the period argument is no longer needed - so it's not actually being used as a query parameter in the request. This should change. Because when that argument is included, the appropriate data is returned.

dpguthrie avatar Nov 20 '20 16:11 dpguthrie

@dpguthrie It's interesting. I'm still receiving this dataframe.

<class 'pandas.core.frame.DataFrame'>
                        close        open        high     volume     low    adjclose
symbol date
AAPL   2020-11-12  119.209999  119.620003  120.529999  103162300  118.57  119.209999

How can I check which URL is generated?

midnightdim avatar Nov 23 '20 06:11 midnightdim

You can put a print statement in the _get_data method of the _YahooFinance class. After urls just put:

    def _get_data(self, key, params={}, **kwargs):
        config = self._CONFIG[key]
        params = self._construct_params(config, params)
        urls = self._construct_urls(config, params, **kwargs)
        print([r.url for r in urls])

dpguthrie avatar Nov 23 '20 21:11 dpguthrie

@dpguthrie Added it. This is the code:

print(date)
t = Ticker('AAPL').history(period='1d',start=date,end=date)
print(type(t))

Here's what I got on the box where it works fine:

2020-12-30
['https://query2.finance.yahoo.com/v8/finance/chart/AAPL?period1=1609261200&period2=1609261200&interval=1d&events=div%2Csplit&formatted=false&lang=en-US&region=US&corsDomain=finance.yahoo.com']
<class 'pandas.core.frame.DataFrame'>

Here's what I got on the box where I need it to work (but it doesn't):

2020-12-30
['https://query2.finance.yahoo.com/v8/finance/chart/AAPL?period1=1609286400&period2=1609286400&interval=1d&events=div%2Csplit&formatted=false&lang=en-US&region=US&corsDomain=finance.yahoo.com']
<class 'dict'>
dict_keys(['meta', 'indicators'])

The period values are different.

midnightdim avatar Jan 06 '21 03:01 midnightdim

I thought that the problem was caused by the timezone difference, so I switched to using UTC based dates. Now they are exactly the same on two machines, but the code still generates different URLs. I think the problem is in _convert_to_timestamp function - it seems to be timezone unaware. As a workaround I'm just adding some timedelta to the date to compensate the timezone difference.

midnightdim avatar Jan 06 '21 04:01 midnightdim

Just one thing to add to this. When I also print the result, I have this on my local box:

2020-12-30
['https://query2.finance.yahoo.com/v8/finance/chart/AAPL?period1=1609261200&period2=1609261200&interval=1d&events=div%2Csplit&formatted=false&lang=en-US&region=US&corsDomain=finance.yahoo.com']
<class 'pandas.core.frame.DataFrame'>
                         high         low        open     volume       close    adjclose
symbol date
AAPL   2020-12-29  138.789993  134.339996  138.050003  121047300  134.869995  134.869995

The local date is 2020-12-30 for me, but as you can see, it's 2020-12-29 when it's converted (to GMT?). Even if I pass it as a string ('2020-12-30') it gets converted to 2020-12-29. This is the expected result for me, but it's something to consider. Maybe history function should also accept timestamps so that the user can prepare it him/herself and put them there.

midnightdim avatar Jan 06 '21 05:01 midnightdim

The problem is definitely in the _convert_to_timestamp function. But, as far as the history method is concerned - you can pass a datetime.datetime object for the start and end parameters.

dpguthrie avatar Jan 06 '21 17:01 dpguthrie

But, as far as the history method is concerned - you can pass a datetime.datetime object for the start and end parameters.

That's exactly what I did in my first examples. I basically took datetime.datetime.today() and subtracted a number of days to get the price from the past. It doesn't matter if I use today minus N days or pass that date as a string - the results are exactly the same (and they differ on computers with different timezones).

Also, that's what I'm using now in this workaround - I send a datetime delta of several hours to compensate the difference. The problem is - even if I set the date to, say, 12/30/2020 00:00 UTC, _convert_to_timestamp converts it to timestamp according to the current timezone, and it yields different values in different timezones.

midnightdim avatar Jan 06 '21 18:01 midnightdim

+1 I'm running into the same issue and just spent hours trying to find it. Caused by running in different time zones.

maxwhoppa avatar May 02 '21 20:05 maxwhoppa

@maxwhoppa @midnightdim like the fellow gents mentioned above, I observed similar behavior when switching my local machine time from PST to HKT, with the same code shown below. All I'm change is the timezone in my computer (e.g. doing it in Control Panel on a windows machine). I had thought by pinning the start and end date to EST and setting adj_timezone=True (this supposing adjust the time to the local time for the AAPL) would be me the correct result.

Expected behaviour: local EST for AAPL

` from data_util import convert_ticker_list_to_symbols_string symbols = convert_ticker_list_to_symbols_string(['AAPL']) tickers_obj = Ticker(symbols, asynchronous=True, adj_ohlc=True, progress=True, validate=True, timeout=999999)

NYC = tz.gettz('America/New_York')
sdt = datetime.datetime(2020,1,1,tzinfo=NYC)
edt = datetime.datetime(2020,1,11,tzinfo=NYC)

df = tickers_obj.history(start=sdt, end=edt,interval='1h',adj_timezone=True)

` For Pacific Time I got

                            high       close  ...        open         low

symbol date ...
AAPL 2020-01-02 10:30:00 298.410004 297.540009 ... 296.239990 295.190002 2020-01-02 11:30:00 298.750000 298.109985 ... 297.540009 297.109985 2020-01-02 12:30:00 298.799988 298.285004 ... 298.100006 297.450012

For HKT I got

                            volume        open  ...        high         low

symbol date ...
AAPL 2019-12-31 11:30:00 0.0 291.219910 ... 292.260010 290.779999 2019-12-31 12:30:00 2082131.0 291.850006 ... 292.269989 291.399994 2019-12-31 13:30:00 1725654.0 292.279999 ... 292.470001 291.880005 2019-12-31 14:30:00 1403135.0 292.089996 ... 292.299988 291.989990 2019-12-31 15:30:00 3089978.0 292.266205 ... 293.260010 292.130005 2019-12-31 16:30:00 3683862.0 293.180115 ... 293.679993 292.920013 2020-01-02 10:30:00 9941640.0 296.239990 ... 298.410004 295.190002 2020-01-02 11:30:00 4795404.0 297.540009 ... 298.750000 297.109985 2020-01-02 12:30:00 3345181.0 298.100006 ... 298.799988 297.450012

darkknight9394 avatar May 23 '21 02:05 darkknight9394

I experimented with mktime in _convert_to_timestamp (line 1228 ticker.py) where it was used to parse the datetime to timestamp. It turns out from this link that mktime assumes input is local time so regardless of the starttime's timezone once it passes thru mktime it converted to local time.

One can run the code below and try changing the timezone to observe the difference. Perhaps a potential solution is to consider the following. @dpguthrie I'm not familar with how to contribute open source project, but would love contribute

`

  1. convert start-time (either local or timezone aware) to utc
  2. _convert_to_timestamp convert starttime to timestamp with calendar.timegm (same applies for end time)

results generated from code below

CURRENT TIME HKT MK TIME 1577808000 GM TIME 1577836800

(now switching timezone from windows control panel)

CURRENT TIME PST MK TIME 1577865600 GM TIME 1577836800

In both runs the GM TIME remains static whereas the MK TIME varies when I change the timezone

` import time import calendar

here we pin sdt to UTC

sdt = datetime.datetime(2020,1,1,tzinfo=tz.UTC) date_mktime = int(time.mktime(sdt.timetuple())) date_gmtime = int(calendar.timegm(sdt.timetuple()))

print("CURRENT TIME PST") print("MK TIME ", date_mktime ) print("timegm ",date_gmtime )

`

darkknight9394 avatar May 23 '21 03:05 darkknight9394