yahooquery
yahooquery copied to clipboard
is the ohlc data corrected?
Just found this package as an possible alternative to yfinance and was wondering, if the ohlc data is corrected, like yf do it (partially).
What I mean is, that ohlc is adjusted to the adjusted close or ticks are corrected where high is below close/open and all the other fun stuff, than one can find in the data, yahoo is providing.
I don't have this in the history
method yet but will add it in the next version release. So, stay tuned.
Take a look at the newest version, 2.2.6. There's an additional argument in the history method, adj_ohlc
. Set that to True
to adjust the OHLC data.
Yes, that's the primary correction to "adjusted close", which I found extremely important when working with technical indicators. What makes sense, as there are big price jumps, when not correcting to aclose. But what about the other problems (partially mentioned above), that occure in yahoo historic data?
What are some of the other problems? Could you provide any examples?
I'd love to!
- As said, sometimes high is not the highest and low is not the lowest price within one tick.
- Due to the adjustment to aclose, there might be ticks, that have in fact prices of 0.00 (especially if you keep the number of digits). That will cause problems.
- Some rows (dividends, splits and maybe holidays) will be nan. This does not happens always, but even at the same stock, yahoo sometimes will give you those ticks, sometimes not :/
- I'm not sure about this, but I do think, that yahoo might have changed the sorting of historic data at least once (oldest date first/newest date first). So maybe the data should be sorted anyway.
- guess, this is not in the scope of a data fetching API, but how to deal with ticks, that have 0 volume? I don't think, that the given prices are meaningful.
Thanks for expanding on your first post. I understand what you’re saying, but I haven’t seen any data come back like that yet. Do you know of any tickers that show data coming back like that?
sure,
historic data (max time period, daily frequency) downloaded as csv directly from yahoo.
for data having nan:
import pandas as pd
d = pd.read_csv('RAW.DE.csv')
print(d[d.isna().any(axis = 1)])
and for having invalid high/open:
d = pd.read_csv('HEN3.DE.csv')
print(d[d.eval('High < Open or High < Close or Low > Open or Low > Close')])
Thanks again for providing the examples. That definitely seems to be a problem. I'm not sure how I'd go about fixing that; do you have any recommendations?
Well, this is another part of the story, as it depends on what the user of the api is going to do with the data. Personally I drop rows having nan or 0.0 prices and correct high to the highest price of the row and low to the lowest. Guess the best thing to do is providing those corrections as optional parameters to let the user decide.
The high and low should always be corrected by the package. There is no use case in which the high shouldn't be high and the low shouldn't be low. This should not be a user option. Right now I'm having to do these corrections manually after retrieving the data.
This has little to nothing to do with optional adjustments for dividends and splits.
Happy to accept a PR to fix this but not something I'm going to fix.
For info, market_prices gets prices via yahooquery
and does correct ohlc.