yahooquery
yahooquery copied to clipboard
Improve error handling for dates outside of Yahoo!'s supported range
Certain errors from Yahoo!'s API responses are not being properly handled when a wide price history range is specified. As an example, the following uses a broad range of dates the year 1902 to today to get maximum history (vs. using the 'max' period type which allows our code to specify date ranges without a special case). Yahoo! complains that it provides only 100 years of data. Of course, we will not use 1902 going forward, perhaps using 1930 until 2030 comes along. A hack for which a reminder will be set in someone's Google Calendar.
yahooquery processes the erroneous results as if they were valid, returning a dict rather than a dataframe. When a list of symbols is provided to Tickers, these errors are incorrectly intermixed with correct results for symbols Yahoo! doesn't complain about.
Try this URL as an example:
https://query2.finance.yahoo.com/v8/finance/chart/%5ETYVIX?period1=-2145898800&period2=1598652741&interval=1d&events=div%2Csplit&formatted=false&lang=en-US®ion=US&corsDomain=finance.yahoo.com
It returns this:
{"chart":{"result":null,"error":{"code":"Unprocessable Entity","description":"1d data not available for startTime=-2145898800 and endTime=1598652741. Only 100 years worth of day granularity data are allowed to be fetched per request."}}}
yahooquery then returns this sort of erroneous blob intermixing :
[5281 rows x 6 columns], '^CASE30': '1d data not available for startTime=-2145898800 and endTime=1598653320. Only 100 years worth of day granularity data are allowed to be fetched per request.', '^HSI': high low close open volume adjclose
1986-12-31 2568.300049 2568.300049 2568.300049 2568.300049 0.000000e+00 2568.300049
1987-01-02 2540.100098 2540.100098 2540.100098 2540.100098 0.000000e+00 2540.100098
1987-01-05 2552.399902 2552.399902 2552.399902 2552.399902 0.000000e+00 2552.399902
1987-01-06 2583.899902 2583.899902 2583.899902 2583.899902 0.000000e+00 2583.899902
1987-01-07 2607.100098 2607.100098 2607.100098 2607.100098 0.000000e+00 2607.100098
With these types:
<class 'dict'>
<class 'str'>
<class 'pandas.core.frame.DataFrame'>
<class 'str'>
<class 'pandas.core.frame.DataFrame'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'pandas.core.frame.DataFrame'>
<class 'str'>
<class 'str'>
The history method is documented as
Returns
-------
pandas.DataFrame
historical pricing data
Which is not true in this case. Looks like the assumptions made in _historical_data_to_dataframe need some revision. I tried to figure out what use cases this was intended to handle but no comments and not obvious to me.
Thanks to everyone who has contributed to this library and to @dpguthrie for putting it together.
@dpguthrie Here's another case in the same vein. Run a query for historical prices on a symbol list that includes stocks and mutual funds, even if unintentional, and erroneous results are returned. e.g., try "BF-B" and "BF.B", one is Brown-Forman, a stock, the other a mutual fund which passes Yahoo!'s validation API just fine because they have data on file.
This produces the same kind of erroneous dict blob rather than a dataframe as the history API promises:
[5083 rows x 7 columns], 'BF.B': {'meta': {'currency': None, 'symbol': 'BF.B', 'exchangeName': 'YHD', 'instrumentType': 'MUTUALFUND', 'firstTradeDate': None, 'regularMarketTime': 1561759658, 'gmtoffset': -14400, 'timezone': 'EDT', 'exchangeTimezoneName': 'America/New_York', 'priceHint': 2, 'currentTradingPeriod': {'pre': {'timezone': 'EDT', 'start': 1598601600, 'end': 1598621400, 'gmtoffset': -14400}, 'regular': {'timezone': 'EDT', 'start': 1598621400, 'end': 1598644800, 'gmtoffset': -14400}, 'post': {'timezone': 'EDT', 'start': 1598644800, 'end': 1598659200, 'gmtoffset': -14400}}, 'dataGranularity': '1d', 'range': '', 'validRanges': ['1mo', '3mo', '6mo', 'ytd', '1y', '2y', '5y', '10y', 'max']}, 'indicators': {'quote': [{}], 'adjclose': [{}]}}, 'SLB': low volume close high open adjclose dividends splits
1981-12-31 13.875000 449200 13.968750 14.000000 13.937500 1.809793 NaN NaN
1982-01-04 13.500000 630400 13.687500 13.968750 13.750000 1.773355 NaN NaN
1982-01-05 13.000000 1076800 13.000000 13.468750 13.250000 1.684282 NaN NaN
1982-01-06 12.750000 1560400 12.875000 12.968750 12.875000 1.668086 NaN NaN
1982-01-07 12.500000 1303600 12.625000 12.937500 12.718750 1.635697 NaN NaN
Thanks again for leaving some feedback.
This can either be documented a little better, changed slightly so only a dataframe is returned, or altered to return two seperate elements (the dataframe and a dictionary containing symbols with no data). The idea behind returning the dictionary was so the user had an idea of what symbols weren't returning data. Initially, my thought was that the user could easily check and concat the dataframes themselves, but they would still know if any symbols were missing data. Something like:
tickers = Ticker(symbols)
df = tickers.history()
if isinstance(df, dict):
df = pd.concat([df[x] for x in df if isinstance(df[x], pd.DataFrame)])
I am certainly open to suggestions though (the dict blob is definitely not an ideal thing to work with).
I'd think yahooquery should do this kind of work itself. It's particular to yahooquery's internals, and, to my eye, not something you'd want all your users to have to bury in their code. I'd suggest returning a tuple of (good, bad) results i.e., (df, dict) or a dict/namedtuple if you want to name returned items ala return dict(good=df, bad=errordict). Clearly a breaking change, though. You could complicate the history API (and any other affected APIs) with a flag. Also messy. In the meantime, I'll adopt this logic into my code.