yfinance copied to clipboard
Inconsistant results - a problem with yfinance or yahoo?
I'm getting inconsistent results, for example the code:
import yfinance.yfinance as yf
print(yf.download('ISF.L ECAR.L', start='2021-03-01', end='2021-03-12', interval='1d', auto_adjust=True, group_by='ticker'))
print(yf.download('ECAR.L ISF.L', start='2021-03-01', end='2021-03-12', interval='1d', auto_adjust=True, group_by='ticker'))
[*********************100%***********************] 2 of 2 completed
Open High Low Close Volume Open High Low Close Volume
2021-03-01 900.299988 910.788025 900.299988 906.549988 73976.0 6.262 6.442 6.262 6.4050 6953
2021-03-02 902.900024 915.094971 901.882996 913.000000 15094.0 6.469 6.469 6.401 6.4010 17332
2021-03-03 921.799988 925.900024 916.693970 921.500000 27767.0 6.362 6.469 6.362 6.3840 5011
2021-03-04 913.900024 922.000000 911.099976 920.400024 27720.0 6.350 6.350 6.141 6.1410 2045
2021-03-05 902.000000 910.599976 902.000000 905.000000 13989.0 6.236 6.390 6.220 6.3900 9125
2021-03-08 NaN NaN NaN NaN NaN 7.458 7.770 7.397 7.5885 340126
2021-03-09 NaN NaN NaN NaN NaN 7.483 7.693 7.483 7.6640 263112
2021-03-10 NaN NaN NaN NaN NaN 7.630 7.752 7.600 7.7010 233957
2021-03-11 NaN NaN NaN NaN NaN 7.808 7.898 7.769 7.8430 227326
[*********************100%***********************] 2 of 2 completed
When I check the yahoo website using https://uk.finance.yahoo.com/quote/ECAR.L/history?p=ECAR.L it returns:
Open High Low Close Volume Open High Low Close Volume
2021-03-01 900.299988 910.788025 900.299988 906.549988 73976.0 7.683 7.760 7.613 7.7300 229947
2021-03-02 902.900024 915.094971 901.882996 913.000000 15094.0 7.740 7.764 7.668 7.7160 157541
2021-03-03 921.799988 925.900024 916.693970 921.500000 27767.0 7.812 7.842 7.668 7.7800 335061
2021-03-04 913.900024 922.000000 911.099976 920.400024 27720.0 7.657 7.672 7.505 7.5760 210196
2021-03-05 902.000000 910.599976 902.000000 905.000000 13989.0 7.448 7.582 7.310 7.3270 202433
2021-03-08 NaN NaN NaN NaN NaN 7.458 7.770 7.397 7.5885 340126
2021-03-09 NaN NaN NaN NaN NaN 7.483 7.693 7.483 7.6640 263112
2021-03-10 NaN NaN NaN NaN NaN 7.630 7.752 7.600 7.7010 233957
2021-03-11 NaN NaN NaN NaN NaN 7.808 7.898 7.769 7.8430 227326
I'm running the latest code. It's probably a Yahoo problem because it's not consistent - often it works. I've no reason to believe the order of tickers makes a difference, this is just one example when it did fail. I'm trying to download 20 years of about 300 tickers and I'm always getting many instances of something like this.
I could scrape Yahoo directly as the data seems to be good on the web page, but maybe this is a known problem?
Does this error persists when you try to collect each symbol separately instead of together?
Thanks for your interest.
The error does persist even with a single ticker. What's more it's slow to change between being correct and being wrong. Here is more debug:
import time
import yfinance.yfinance as yf
last = None
for n in range(2**16):
data = yf.download('ECAR.L', start='2021-03-05', end='2021-03-06', interval='1d')
curr = data.to_numpy()[0,0]
if curr != last:
print(time.asctime(time.gmtime()), curr)
last = curr
Which I've just run and it gives output like:
Sun Mar 14 13:38:05 2021 7.447999954223633
Sun Mar 14 13:52:04 2021 6.236000061035156
Sun Mar 14 13:52:07 2021 7.447999954223633
Sun Mar 14 13:52:09 2021 6.236000061035156
Sun Mar 14 14:02:06 2021 7.447999954223633
Sun Mar 14 14:02:07 2021 6.236000061035156
Sun Mar 14 14:02:08 2021 7.447999954223633
Sun Mar 14 14:02:09 2021 6.236000061035156
Sun Mar 14 14:02:10 2021 7.447999954223633
Sun Mar 14 15:04:35 2021 6.236000061035156
Sun Mar 14 15:04:38 2021 7.447999954223633
Sun Mar 14 15:04:39 2021 6.236000061035156
So the returned values are stable for a long periods of time (very many minutes) and have noisy change overs between the stable values.
If I was asked to speculate what the problem is I'd say that the URL used is being resolved to different servers and some of them have mangled data. That is, it doesn't look like a yfinace problem to me, it looks like a Yahoo problem.
I have a theory (but that might be wrong) that Yahoo may have some differential privacy implemented (add random noise), maybe try comparing the same date candle data with google finance or bloomberg.
This is an interesting error and might make yahoo an unreliable data source if proved for many tickers.
If it's deliberate it's a very very strange decision. The step change between 6.2 and 7.4 is great, and I see bigger changes elsewhere.
Earlier I believed this didn't show on the web site - I've just seen it so it's a fundamental Yahoo problem.
Hi all, I have the exact same issue when I retrieve aus stock data (AX). Not sure how to fix it.
@jackyclever there is no fix if yahoo is providing us with wrong data.
Well, the boundary between good and bad data is always in the same place so, in theory, you could make multiple calls to get all the data. I've considered this, but I'll probably end up scraping iShares directly as much of what I want isn't on yahoo.
this is a little piece of code I have written to test the data:
import yfinance as yf
from datetime import datetime, timedelta
import pandas as pd
endDate = datetime.today()
startDate = endDate - timedelta(days=int(5*365))
allTickers = pd.read_csv('Data/asx200.csv')['Ticker'].tolist()
last = yf.download(allTickers, start=startDate, end=endDate)
curr = yf.download(allTickers, start=startDate, end=endDate)
dfCompare = last.compare(curr)
dfCompare.to_csv('compare test.csv')
when I ran this in my local machine, the data is very stable, but when I run it on Azure VM, it becomes inconsistent. Python version on VM is 3.9.1, and local machine is 3.8.5.
Any ideas?
Hi @drtonyr I maybe see the exact same issue:
When I fetch the hist data from UIMM.DE, I also the this inconsistent data. For me the problem is that the currency of the ticker is Euro but I randomly see also Dollar courses in my padas data.
And since your changes are also about 20% lower it looks like 7.44 is in Dollar and 6.24 has been converted to Euro somehow.
Any progress about this?
My suspection is that it might depend on the server that yahoo returns and the IP address that you are using. Again, this is not a yfinance bug.
Do you mean if I can guarantee that my address and the address return by Yahoo's domain DNS query will keep invariant, then the data returned will invariant?
Do you mean if I can guarantee that my address and the address return by Yahoo's domain DNS query will keep invariant, then the data returned will invariant?
Noone can guarantee anything when we don't have control over yahoo api/data and servers.
One question @silvavn, what about if the error doesn't persists when you try to collect each symbol separately? Many thanx
@RogerGR98 There are too many variables involved to be sure. We know that collecting at different timezones and using different yahoo servers (e.g. ca, uk, etc) can return different results. Ultimately, this is a Yahoo data problem and not yfinance.
Instead of old data Yahoo now blocks the whole requests with 403 if you crawl too much.
Interesting bug. I'd like to know what is the current settings for the request headers and what url is used for the scrape? I was using my own request headers in #903, and perhaps overriding the default one would help, while also ensuring you're not running on cached data (which I suspect is very likely on large data loads).
import yfinance
t = yfinance.Ticker("RY.TO")
t.cashflow["2021-10-31"]["Capital Expenditures"]
# -2186000000.0 check on https://ca.finance.yahoo.com/quote/RY.TO/cash-flow?p=RY.TO and the value is as expected.
t.cashflow["2021-10-31"]["Total Cash From Operating Activities"]
# -27832000000.0 same link, expected some kind of 61,044,000,000
Any idea why it's so different between the library and the site?
Is anyone else still experiencing this problem? I think I have a solution but need someone to test.
My idea is - Yahoo returns a currency attribute with price data. Where price data is changing, then hopefully too is currency. Solution would be add a 'currency' attribute to returned table.