yfinance icon indicating copy to clipboard operation
yfinance copied to clipboard

utils.py: list index out of range

Open VanNostrand opened this issue 3 years ago • 7 comments

There is a strange behaviour with yfinance 0.1.94 when I try to read ticker "G7W.DU": Sometimes it works and sometimes the utils.py gets a list index out of range error.

What I expect (and sometimes works):

$ python
Python 3.10.9 (main, Dec 11 2022, 14:50:46) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import yfinance as yf
>>> t = "G7W.DU"
>>> ticker = yf.Ticker(t)
>>> ticker.info["regularMarketPrice"]
97

What I often get:

$ python
Python 3.10.9 (main, Dec 11 2022, 14:50:46) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import yfinance as yf
>>> t = "G7W.DU"
>>> ticker = yf.Ticker(t)
>>> ticker.info["regularMarketPrice"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/foo/.local/lib/python3.10/site-packages/yfinance/ticker.py", line 147, in info
    return self.get_info()
  File "/home/foo/.local/lib/python3.10/site-packages/yfinance/base.py", line 742, in get_info
    self._get_info(proxy)
  File "/home/foo/.local/lib/python3.10/site-packages/yfinance/base.py", line 424, in _get_info
    data = utils.get_json(ticker_url, proxy, self.session)
  File "/home/foo/.local/lib/python3.10/site-packages/yfinance/utils.py", line 205, in get_json
    json_str = html.split('root.App.main =')[1].split(
IndexError: list index out of range

There seems to be something special with G7W.DU because I only get the error there, so far. I tried 5 tickers so far and only that one creates this error.

VanNostrand avatar Dec 19 '22 15:12 VanNostrand

okay I will look into this issue

keenborder786 avatar Dec 19 '22 16:12 keenborder786

Easy fix, try Git branch quotes-html-parsing

ValueRaider avatar Dec 19 '22 16:12 ValueRaider

I traced back the error and tried printing out the HTML content generated by get_json() I tried running it multiple times and turns out that when you get the answer 97 then HTML content is being generated properly and as follow:

<script id="wafer-caas-config" type="application/json">{"caasUrl":"https://www.yahoo.com/caas/content/article/","contextParams":"appid=article2_csn&bucket=HPMODALMAST100,FPSATE101,FPDOGFOOD202,finance-US-en-US-def&device=desktop&features=enableAdFeedbackV2,enableInArticleAd,enableSlideShowKV,enableVideoDocking,ncp,oathPlayer,outStream,enableXrayTickerEntities,enableXrayNcp,enableXrayHyperloopCards,enableXrayCardsFollowButton,enableAdLiteUpSellFeedback,enableAdSlotsOneSlot,enableSingleSlotting,exposeYctIds,showCommentsIconInShareSec,enableBodySlots,enableContentMeta,enableUpsellFirstArticleOnly,enableStockChart,enableStockChartWatchlist&lang=en-US&region=US&site=finance"}</script>
<script src="https://s.yimg.com/zz/combo?s:aaq/vzm/cs_1.4.0.js&s:os/yaft/yaft-0.3.27.min.js&s:os/yaft/yaft-plugin-aftnoad-0.1.5.min.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-core-1.61.0.js"></script>
<script src="https://s.yimg.com/aaq/wf/wf-rapid-1.9.1.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-bind-1.1.3.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-fetch-1.18.10.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-form-1.31.1.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-image-1.4.0.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-menu-1.1.8.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-tabs-1.12.6.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-toggle-1.15.4.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-tooltip-1.1.3.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-beacon-1.3.3.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-caas-1.18.3.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-darla-1.6.1.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-loader-2.1.18.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-sticky-1.1.0.js" defer></script>
<script src="https://s.yimg.com/aaq/wf/wf-template-1.4.3.js" defer></script>
<script src="https://s.yimg.com/aaq/hp-viewer/desktop_1.10.334.js"></script>
<script src="https://s.aolcdn.com/membership/omp-static/omp-widgets/2.0.0/switch-widget.prod.js"></script>
<script src="https://s.yimg.com/uc/finance/dd-site/js/main.c7e79f6b5910161a9af8.min.js" defer></script>
<script src="https://s.yimg.com/aaq/pv/perf-vitals_2.1.1.js" async></script>
<script>
        (function () {
            var w = window.wafer || {};
            typeof w.ready === 'function' && w.ready(function () {
                typeof w.on === 'function' && w.on('tab:selected', function (e) {
                    try {
                        if (e && e.meta && e.meta.targetElem.id === 'header-notification-menu') {
                            window.setTimeout(function hideBadge() {w.base.state = {financeNotification:{hideBadge:'1'}};}, 250);
                            var rapidEvent = w.base.state.financeNotification.i13n.showPanel;
                            window.rapidInstance.beaconEvent(rapidEvent.event, rapidEvent.data, rapidEvent.outcm);
                        }
                    } catch (ignore) {}
                });
            }, window);
        })();</script>
<script>window.webpackPublicPath='https://s.yimg.com/uc/finance/dd-site/js/';</script></body></html>`

However, when you get the IndexError, then HTML generated by get_json() is as follow:

<html>
  <meta charset='utf-8'>
  <script>
    if (window != window.top) {
      document.write(' < p > Content is currently unavailable. < /p> < img src = "//geo.yahoo.com/p?s=1197757039&t='+new Date().getTime()+'&_R='+encodeURIComponent(document.referrer)+'&err=404&err_url='+'https%3A%2F%2Ffinance.yahoo.com%2Fquote%2FG7W.DU'+'"
          width = "0px"
          height = "0px" / > ');}else{window.location.replace('
          https: //www.yahoo.com/?err=404&err_url=https%3A%2F%2Ffinance.yahoo.com%2Fquote%2FG7W.DU');}
  </script>
  <noscript>
    <META http-equiv="refresh" content="0;URL='https://www.yahoo.com/?err=404&err_url=https%3A%2F%2Ffinance.yahoo.com%2Fquote%2FG7W.DU'">
  </noscript>
</html>

This means the problem is not with the code but with the ticker you are using whose data becomes unavailable at Yahoo finance for some reason at random moments.

I even tried opening the ticker info in my browser and my hypothesis was correct, sometimes the pages load and sometimes it gives a 404 error.

The best solution for your downstream code will be to use some sort of try/except statement.

PS: @ValueRaider has also created a PR that will handle such bad tickers as well which just returns an empty dict in case of bad tickers. #1257

keenborder786 avatar Dec 19 '22 18:12 keenborder786

It seems to work now, I am using the code from the PR and changed my program.

After adding the python logger to my program I can now see the debug messages from the URL access, where sometimes it results in http 404 and sometimes in http 200 status:

DEBUG:root:checking Games Workshop
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/G7W.DU HTTP/1.1" 404 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query1.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query1.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/G7W.DU?symbol=G7W.DU&type=trailingPegRatio&period1=1655754888&period2=1671569688 HTTP/1.1" 200 99
INFO:root:Got bad ticker for G7W.DU, retrying...
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/G7W.DU HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query1.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query1.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/G7W.DU?symbol=G7W.DU&type=trailingPegRatio&period1=1655754890&period2=1671569690 HTTP/1.1" 200 99

Now I assign the ticker.info content once, check if all entries that I require are found, repeat if necessary and work with that set of data without calling ticker.info again - because I also noticed now that ticker.info() updates the content whenever I execute it (e.g. ticker.info["regularMarketPrice"] first, then ticker.info["currency"] afterwards). It is likely a feature but I did not notice it until now.

VanNostrand avatar Dec 19 '22 21:12 VanNostrand

The "issue" was that Yahoo was not returning data for the request https://finance.yahoo.com/quote/G7W.DU . Which is correct - visit that page and it's full of N/As. yfinance used to handle this but that got lost in the fix.

It sounds like you are hacking the requests or info creation, but I can't tell - you should not have to interfere at all.

ValueRaider avatar Dec 19 '22 21:12 ValueRaider

It sounds like you are hacking the requests or info creation

not really - I basically did what I posted in the minimal program: reading the ticker info and extracting values of interest from that (by calling ticker.info[...] a couple of times for different entries). I don't know, maybe that is not best practise.

So it was two things coming together: my approach of implicitly refreshing the data and Yahoo's data issue: I printed the ticker info first and found everything in place, then I successfully called ticker.info["regularMarketPrice"] and then I tried to call ticker.info["currency"] which caused an exception, telling me there is no such key even, though I had seen it in the print (with a proper value too). Printing the object again showed that it was nearly empty and the key was indeed gone - also the price of 97 was "None" now. That shows that the object changes and that Yahoo sometimes sends incomplete data.

So in addition to the patch the solution to my code is now to read ticker.info just once into a variable, verify the content and work with that data, without using ticker.info further. The examples in the documentation don't tell that every call updates the object, they just show that this is the function to read the ticker info.

VanNostrand avatar Dec 19 '22 22:12 VanNostrand

I understand now, info[] is being recreated every call. That's a bug, only happens with this ticker. Possibly the repeated calls were "annoying" Yahoo, causing it to return less data in subsequent calls - this happens.

~Checkout branch r0.1/fix/info-not-caching~ Now merged into 0.1

EDIT: This is already fixed in 0.2.1 version which we've just released.

ValueRaider avatar Dec 19 '22 22:12 ValueRaider