yahooquery icon indicating copy to clipboard operation
yahooquery copied to clipboard

is the ohlc data corrected?

Open slapslash opened this issue 4 years ago • 10 comments

Just found this package as an possible alternative to yfinance and was wondering, if the ohlc data is corrected, like yf do it (partially).
What I mean is, that ohlc is adjusted to the adjusted close or ticks are corrected where high is below close/open and all the other fun stuff, than one can find in the data, yahoo is providing.

slapslash avatar Jul 31 '20 06:07 slapslash

I don't have this in the history method yet but will add it in the next version release. So, stay tuned.

dpguthrie avatar Jul 31 '20 21:07 dpguthrie

Take a look at the newest version, 2.2.6. There's an additional argument in the history method, adj_ohlc. Set that to True to adjust the OHLC data.

dpguthrie avatar Aug 02 '20 22:08 dpguthrie

Yes, that's the primary correction to "adjusted close", which I found extremely important when working with technical indicators. What makes sense, as there are big price jumps, when not correcting to aclose. But what about the other problems (partially mentioned above), that occure in yahoo historic data?

slapslash avatar Aug 04 '20 06:08 slapslash

What are some of the other problems? Could you provide any examples?

dpguthrie avatar Aug 08 '20 19:08 dpguthrie

I'd love to!

  • As said, sometimes high is not the highest and low is not the lowest price within one tick.
  • Due to the adjustment to aclose, there might be ticks, that have in fact prices of 0.00 (especially if you keep the number of digits). That will cause problems.
  • Some rows (dividends, splits and maybe holidays) will be nan. This does not happens always, but even at the same stock, yahoo sometimes will give you those ticks, sometimes not :/
  • I'm not sure about this, but I do think, that yahoo might have changed the sorting of historic data at least once (oldest date first/newest date first). So maybe the data should be sorted anyway.
  • guess, this is not in the scope of a data fetching API, but how to deal with ticks, that have 0 volume? I don't think, that the given prices are meaningful.

slapslash avatar Aug 10 '20 13:08 slapslash

Thanks for expanding on your first post. I understand what you’re saying, but I haven’t seen any data come back like that yet. Do you know of any tickers that show data coming back like that?

dpguthrie avatar Aug 13 '20 02:08 dpguthrie

sure,

historic data (max time period, daily frequency) downloaded as csv directly from yahoo.

for data having nan:

import pandas as pd

d = pd.read_csv('RAW.DE.csv')
print(d[d.isna().any(axis = 1)])

and for having invalid high/open:

d = pd.read_csv('HEN3.DE.csv')
print(d[d.eval('High < Open or High < Close or Low > Open or Low > Close')])

slapslash avatar Aug 13 '20 11:08 slapslash

Thanks again for providing the examples. That definitely seems to be a problem. I'm not sure how I'd go about fixing that; do you have any recommendations?

dpguthrie avatar Aug 18 '20 15:08 dpguthrie

Well, this is another part of the story, as it depends on what the user of the api is going to do with the data. Personally I drop rows having nan or 0.0 prices and correct high to the highest price of the row and low to the lowest. Guess the best thing to do is providing those corrections as optional parameters to let the user decide.

slapslash avatar Sep 15 '20 09:09 slapslash

The high and low should always be corrected by the package. There is no use case in which the high shouldn't be high and the low shouldn't be low. This should not be a user option. Right now I'm having to do these corrections manually after retrieving the data.

This has little to nothing to do with optional adjustments for dividends and splits.

impredicative avatar Mar 27 '21 16:03 impredicative

Happy to accept a PR to fix this but not something I'm going to fix.

dpguthrie avatar Oct 15 '22 20:10 dpguthrie

For info, market_prices gets prices via yahooquery and does correct ohlc.

maread99 avatar Nov 21 '22 12:11 maread99