yahooquery icon indicating copy to clipboard operation
yahooquery copied to clipboard

Improve error handling for dates outside of Yahoo!'s supported range

Open shipmints opened this issue 5 years ago • 3 comments
trafficstars

Certain errors from Yahoo!'s API responses are not being properly handled when a wide price history range is specified. As an example, the following uses a broad range of dates the year 1902 to today to get maximum history (vs. using the 'max' period type which allows our code to specify date ranges without a special case). Yahoo! complains that it provides only 100 years of data. Of course, we will not use 1902 going forward, perhaps using 1930 until 2030 comes along. A hack for which a reminder will be set in someone's Google Calendar.

yahooquery processes the erroneous results as if they were valid, returning a dict rather than a dataframe. When a list of symbols is provided to Tickers, these errors are incorrectly intermixed with correct results for symbols Yahoo! doesn't complain about.

Try this URL as an example:

https://query2.finance.yahoo.com/v8/finance/chart/%5ETYVIX?period1=-2145898800&period2=1598652741&interval=1d&events=div%2Csplit&formatted=false&lang=en-US&region=US&corsDomain=finance.yahoo.com

It returns this:

{"chart":{"result":null,"error":{"code":"Unprocessable Entity","description":"1d data not available for startTime=-2145898800 and endTime=1598652741. Only 100 years worth of day granularity data are allowed to be fetched per request."}}}

yahooquery then returns this sort of erroneous blob intermixing :

[5281 rows x 6 columns], '^CASE30': '1d data not available for startTime=-2145898800 and endTime=1598653320. Only 100 years worth of day granularity data are allowed to be fetched per request.', '^HSI':                     high           low         close          open        volume      adjclose
1986-12-31   2568.300049   2568.300049   2568.300049   2568.300049  0.000000e+00   2568.300049
1987-01-02   2540.100098   2540.100098   2540.100098   2540.100098  0.000000e+00   2540.100098
1987-01-05   2552.399902   2552.399902   2552.399902   2552.399902  0.000000e+00   2552.399902
1987-01-06   2583.899902   2583.899902   2583.899902   2583.899902  0.000000e+00   2583.899902
1987-01-07   2607.100098   2607.100098   2607.100098   2607.100098  0.000000e+00   2607.100098

With these types:

<class 'dict'>
<class 'str'>
<class 'pandas.core.frame.DataFrame'>
<class 'str'>
<class 'pandas.core.frame.DataFrame'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'pandas.core.frame.DataFrame'>
<class 'str'>
<class 'str'>

The history method is documented as

        Returns
        -------
        pandas.DataFrame
            historical pricing data

Which is not true in this case. Looks like the assumptions made in _historical_data_to_dataframe need some revision. I tried to figure out what use cases this was intended to handle but no comments and not obvious to me.

Thanks to everyone who has contributed to this library and to @dpguthrie for putting it together.

shipmints avatar Aug 28 '20 22:08 shipmints

@dpguthrie Here's another case in the same vein. Run a query for historical prices on a symbol list that includes stocks and mutual funds, even if unintentional, and erroneous results are returned. e.g., try "BF-B" and "BF.B", one is Brown-Forman, a stock, the other a mutual fund which passes Yahoo!'s validation API just fine because they have data on file.

This produces the same kind of erroneous dict blob rather than a dataframe as the history API promises:

[5083 rows x 7 columns], 'BF.B': {'meta': {'currency': None, 'symbol': 'BF.B', 'exchangeName': 'YHD', 'instrumentType': 'MUTUALFUND', 'firstTradeDate': None, 'regularMarketTime': 1561759658, 'gmtoffset': -14400, 'timezone': 'EDT', 'exchangeTimezoneName': 'America/New_York', 'priceHint': 2, 'currentTradingPeriod': {'pre': {'timezone': 'EDT', 'start': 1598601600, 'end': 1598621400, 'gmtoffset': -14400}, 'regular': {'timezone': 'EDT', 'start': 1598621400, 'end': 1598644800, 'gmtoffset': -14400}, 'post': {'timezone': 'EDT', 'start': 1598644800, 'end': 1598659200, 'gmtoffset': -14400}}, 'dataGranularity': '1d', 'range': '', 'validRanges': ['1mo', '3mo', '6mo', 'ytd', '1y', '2y', '5y', '10y', 'max']}, 'indicators': {'quote': [{}], 'adjclose': [{}]}}, 'SLB':                   low    volume      close       high       open   adjclose  dividends  splits
1981-12-31  13.875000    449200  13.968750  14.000000  13.937500   1.809793        NaN     NaN
1982-01-04  13.500000    630400  13.687500  13.968750  13.750000   1.773355        NaN     NaN
1982-01-05  13.000000   1076800  13.000000  13.468750  13.250000   1.684282        NaN     NaN
1982-01-06  12.750000   1560400  12.875000  12.968750  12.875000   1.668086        NaN     NaN
1982-01-07  12.500000   1303600  12.625000  12.937500  12.718750   1.635697        NaN     NaN

shipmints avatar Aug 29 '20 17:08 shipmints

Thanks again for leaving some feedback.

This can either be documented a little better, changed slightly so only a dataframe is returned, or altered to return two seperate elements (the dataframe and a dictionary containing symbols with no data). The idea behind returning the dictionary was so the user had an idea of what symbols weren't returning data. Initially, my thought was that the user could easily check and concat the dataframes themselves, but they would still know if any symbols were missing data. Something like:

tickers = Ticker(symbols)
df = tickers.history()

if isinstance(df, dict):
    df = pd.concat([df[x] for x in df if isinstance(df[x], pd.DataFrame)])

I am certainly open to suggestions though (the dict blob is definitely not an ideal thing to work with).

dpguthrie avatar Aug 30 '20 20:08 dpguthrie

I'd think yahooquery should do this kind of work itself. It's particular to yahooquery's internals, and, to my eye, not something you'd want all your users to have to bury in their code. I'd suggest returning a tuple of (good, bad) results i.e., (df, dict) or a dict/namedtuple if you want to name returned items ala return dict(good=df, bad=errordict). Clearly a breaking change, though. You could complicate the history API (and any other affected APIs) with a flag. Also messy. In the meantime, I'll adopt this logic into my code.

shipmints avatar Aug 30 '20 21:08 shipmints