yfinance icon indicating copy to clipboard operation
yfinance copied to clipboard

Inconsistant results - a problem with yfinance or yahoo?

Open drtonyr opened this issue 3 years ago • 18 comments

I'm getting inconsistent results, for example the code:

import yfinance.yfinance as yf

print(yf.download('ISF.L ECAR.L', start='2021-03-01', end='2021-03-12', interval='1d', auto_adjust=True, group_by='ticker'))
print(yf.download('ECAR.L ISF.L', start='2021-03-01', end='2021-03-12', interval='1d', auto_adjust=True, group_by='ticker'))

Yields:

[*********************100%***********************]  2 of 2 completed
                 ISF.L                                              ECAR.L                              
                  Open        High         Low       Close   Volume   Open   High    Low   Close  Volume
Date                                                                                                    
2021-03-01  900.299988  910.788025  900.299988  906.549988  73976.0  6.262  6.442  6.262  6.4050    6953
2021-03-02  902.900024  915.094971  901.882996  913.000000  15094.0  6.469  6.469  6.401  6.4010   17332
2021-03-03  921.799988  925.900024  916.693970  921.500000  27767.0  6.362  6.469  6.362  6.3840    5011
2021-03-04  913.900024  922.000000  911.099976  920.400024  27720.0  6.350  6.350  6.141  6.1410    2045
2021-03-05  902.000000  910.599976  902.000000  905.000000  13989.0  6.236  6.390  6.220  6.3900    9125
2021-03-08         NaN         NaN         NaN         NaN      NaN  7.458  7.770  7.397  7.5885  340126
2021-03-09         NaN         NaN         NaN         NaN      NaN  7.483  7.693  7.483  7.6640  263112
2021-03-10         NaN         NaN         NaN         NaN      NaN  7.630  7.752  7.600  7.7010  233957
2021-03-11         NaN         NaN         NaN         NaN      NaN  7.808  7.898  7.769  7.8430  227326
[*********************100%***********************]  2 of 2 completed

When I check the yahoo website using https://uk.finance.yahoo.com/quote/ECAR.L/history?p=ECAR.L it returns:

                 ISF.L                                              ECAR.L                              
                  Open        High         Low       Close   Volume   Open   High    Low   Close  Volume
Date                                                                                                    
2021-03-01  900.299988  910.788025  900.299988  906.549988  73976.0  7.683  7.760  7.613  7.7300  229947
2021-03-02  902.900024  915.094971  901.882996  913.000000  15094.0  7.740  7.764  7.668  7.7160  157541
2021-03-03  921.799988  925.900024  916.693970  921.500000  27767.0  7.812  7.842  7.668  7.7800  335061
2021-03-04  913.900024  922.000000  911.099976  920.400024  27720.0  7.657  7.672  7.505  7.5760  210196
2021-03-05  902.000000  910.599976  902.000000  905.000000  13989.0  7.448  7.582  7.310  7.3270  202433
2021-03-08         NaN         NaN         NaN         NaN      NaN  7.458  7.770  7.397  7.5885  340126
2021-03-09         NaN         NaN         NaN         NaN      NaN  7.483  7.693  7.483  7.6640  263112
2021-03-10         NaN         NaN         NaN         NaN      NaN  7.630  7.752  7.600  7.7010  233957
2021-03-11         NaN         NaN         NaN         NaN      NaN  7.808  7.898  7.769  7.8430  227326

I'm running the latest code. It's probably a Yahoo problem because it's not consistent - often it works. I've no reason to believe the order of tickers makes a difference, this is just one example when it did fail. I'm trying to download 20 years of about 300 tickers and I'm always getting many instances of something like this.

I could scrape Yahoo directly as the data seems to be good on the web page, but maybe this is a known problem?

drtonyr avatar Mar 13 '21 14:03 drtonyr

Does this error persists when you try to collect each symbol separately instead of together?

silvavn avatar Mar 14 '21 04:03 silvavn

Thanks for your interest.

The error does persist even with a single ticker. What's more it's slow to change between being correct and being wrong. Here is more debug:

import time
import yfinance.yfinance as yf

last = None
for n in range(2**16):
  data = yf.download('ECAR.L', start='2021-03-05', end='2021-03-06', interval='1d')
  curr = data.to_numpy()[0,0]
  if curr != last:
    print(time.asctime(time.gmtime()), curr)
  last = curr
  time.sleep(1)

Which I've just run and it gives output like:

Sun Mar 14 13:38:05 2021 7.447999954223633
Sun Mar 14 13:52:04 2021 6.236000061035156
Sun Mar 14 13:52:07 2021 7.447999954223633
Sun Mar 14 13:52:09 2021 6.236000061035156
Sun Mar 14 14:02:06 2021 7.447999954223633
Sun Mar 14 14:02:07 2021 6.236000061035156
Sun Mar 14 14:02:08 2021 7.447999954223633
Sun Mar 14 14:02:09 2021 6.236000061035156
Sun Mar 14 14:02:10 2021 7.447999954223633
Sun Mar 14 15:04:35 2021 6.236000061035156
Sun Mar 14 15:04:38 2021 7.447999954223633
Sun Mar 14 15:04:39 2021 6.236000061035156

So the returned values are stable for a long periods of time (very many minutes) and have noisy change overs between the stable values.

If I was asked to speculate what the problem is I'd say that the URL used is being resolved to different servers and some of them have mangled data. That is, it doesn't look like a yfinace problem to me, it looks like a Yahoo problem.

drtonyr avatar Mar 14 '21 15:03 drtonyr

I have a theory (but that might be wrong) that Yahoo may have some differential privacy implemented (add random noise), maybe try comparing the same date candle data with google finance or bloomberg.

This is an interesting error and might make yahoo an unreliable data source if proved for many tickers.

silvavn avatar Mar 14 '21 15:03 silvavn

If it's deliberate it's a very very strange decision. The step change between 6.2 and 7.4 is great, and I see bigger changes elsewhere.

Earlier I believed this didn't show on the web site - I've just seen it so it's a fundamental Yahoo problem. https://uk.finance.yahoo.com/quote/ECAR.L/history?period1=1584099813&period2=1615635813&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true image

drtonyr avatar Mar 14 '21 15:03 drtonyr

Hi all, I have the exact same issue when I retrieve aus stock data (AX). Not sure how to fix it.

jackyclever avatar Mar 16 '21 19:03 jackyclever

@jackyclever there is no fix if yahoo is providing us with wrong data.

silvavn avatar Mar 16 '21 19:03 silvavn

Well, the boundary between good and bad data is always in the same place so, in theory, you could make multiple calls to get all the data. I've considered this, but I'll probably end up scraping iShares directly as much of what I want isn't on yahoo.

drtonyr avatar Mar 16 '21 19:03 drtonyr

this is a little piece of code I have written to test the data:

import yfinance as yf
from datetime import datetime, timedelta
import pandas as pd

endDate = datetime.today()
startDate = endDate - timedelta(days=int(5*365))
allTickers = pd.read_csv('Data/asx200.csv')['Ticker'].tolist()
last = yf.download(allTickers, start=startDate, end=endDate)
curr = yf.download(allTickers, start=startDate, end=endDate)
dfCompare = last.compare(curr)
dfCompare.to_csv('compare test.csv')

when I ran this in my local machine, the data is very stable, but when I run it on Azure VM, it becomes inconsistent. Python version on VM is 3.9.1, and local machine is 3.8.5.

Any ideas?

jackyclever avatar Mar 17 '21 01:03 jackyclever

Hi @drtonyr I maybe see the exact same issue:

When I fetch the hist data from UIMM.DE, I also the this inconsistent data. For me the problem is that the currency of the ticker is Euro but I randomly see also Dollar courses in my padas data.

And since your changes are also about 20% lower it looks like 7.44 is in Dollar and 6.24 has been converted to Euro somehow.

mguski avatar May 25 '21 20:05 mguski

Any progress about this?

GF-Huang avatar Jun 19 '21 17:06 GF-Huang

My suspection is that it might depend on the server that yahoo returns and the IP address that you are using. Again, this is not a yfinance bug.

silvavn avatar Jun 19 '21 23:06 silvavn

Do you mean if I can guarantee that my address and the address return by Yahoo's domain DNS query will keep invariant, then the data returned will invariant?

GF-Huang avatar Jun 20 '21 05:06 GF-Huang

Do you mean if I can guarantee that my address and the address return by Yahoo's domain DNS query will keep invariant, then the data returned will invariant?

Noone can guarantee anything when we don't have control over yahoo api/data and servers.

silvavn avatar Jun 20 '21 07:06 silvavn

One question @silvavn, what about if the error doesn't persists when you try to collect each symbol separately? Many thanx

RogerGR98 avatar Sep 14 '21 10:09 RogerGR98

@RogerGR98 There are too many variables involved to be sure. We know that collecting at different timezones and using different yahoo servers (e.g. ca, uk, etc) can return different results. Ultimately, this is a Yahoo data problem and not yfinance.

silvavn avatar Oct 13 '21 16:10 silvavn

Instead of old data Yahoo now blocks the whole requests with 403 if you crawl too much.

marcokrueger avatar Oct 23 '21 11:10 marcokrueger

Interesting bug. I'd like to know what is the current settings for the request headers and what url is used for the scrape? I was using my own request headers in #903, and perhaps overriding the default one would help, while also ensuring you're not running on cached data (which I suspect is very likely on large data loads).

eabase avatar Dec 22 '21 21:12 eabase

import yfinance
t = yfinance.Ticker("RY.TO")
t.cashflow["2021-10-31"]["Capital Expenditures"]
# -2186000000.0 check on https://ca.finance.yahoo.com/quote/RY.TO/cash-flow?p=RY.TO and the value is as expected.

t.cashflow["2021-10-31"]["Total Cash From Operating Activities"]
# -27832000000.0 same link, expected some kind of 61,044,000,000

Any idea why it's so different between the library and the site?

kentrosi avatar Jul 23 '22 20:07 kentrosi

Is anyone else still experiencing this problem? I think I have a solution but need someone to test.

My idea is - Yahoo returns a currency attribute with price data. Where price data is changing, then hopefully too is currency. Solution would be add a 'currency' attribute to returned table.

ValueRaider avatar Oct 26 '22 21:10 ValueRaider