Duplicate Entries & Missing Volumes
A) There are duplicate entries when downloading prices using yf.download. This seems to be a perennial issue and happen almost every day (for many tickers). The below code is retrieved for Dr.Reddy and HindPetro where there is a duplicate entries on 2021-07-09.
yf.download('DRREDDY.NS', period='5d', progress=False)
Open High Low Close Adj Close Volume
Date
2021-07-05 5576.200195 5596.000000 5531.950195 5537.899902 5512.574707 297341.0
2021-07-06 5539.799805 5561.200195 5514.000000 5539.450195 5514.117676 178108.0
2021-07-07 5529.000000 5614.600098 5515.049805 5562.100098 5536.664062 538508.0
2021-07-08 5540.000000 5588.000000 5449.600098 5466.750000 5441.750000 451167.0
2021-07-09 NaN NaN NaN NaN NaN NaN
2021-07-09 5448.000000 5520.250000 5416.950195 5460.299805 5460.299805 460617.0
>>> yf.download('HINDPETRO.NS', period='5d', progress=False)
Open High Low Close Adj Close Volume
Date
2021-07-05 301.399994 305.799988 298.500000 304.450012 259.252106 4835974.0
2021-07-06 306.500000 309.450012 303.200012 304.549988 259.337250 6501684.0
2021-07-07 306.250000 307.100006 299.500000 306.350006 260.870026 6552189.0
2021-07-08 285.000000 289.399994 283.049988 283.850006 261.100006 9236777.0
2021-07-09 NaN NaN NaN NaN NaN NaN
2021-07-09 284.100006 285.000000 278.049988 278.950012 278.950012 2925669.0
How do we fix this? Are there any permanent fix?
B) The volumes for indices are backfilled and not printed point-in-time. In few instance, it is sparse. For example for ^NSEI the volumes were 0 for 2021-07-09, but these are getting backfilled after 2/3 days. And, for ^NSEBANK, it is sparse. How do we address this inconsistency?
>>> yf.download('^NSEI', period='5d', progress=False)
Open High Low Close Adj Close Volume
Date
2021-07-05 15793.400391 15845.950195 15762.049805 15834.349609 15834.349609 207000
2021-07-06 15813.750000 15914.200195 15801.000000 15818.250000 15818.250000 391400
2021-07-07 15819.599609 15893.549805 15779.700195 15879.650391 15879.650391 329300
2021-07-08 15855.400391 15885.750000 15682.900391 15727.900391 15727.900391 307900
2021-07-09 15688.250000 15730.849609 15632.750000 15689.799805 15689.799805 0
>>> yf.download('^NSEBANK', period='10d', progress=False)
Open High Low Close Adj Close Volume
Date
2021-06-28 35488.199219 35576.949219 35236.449219 35359.449219 35359.449219 0
2021-06-29 35320.449219 35337.449219 34913.699219 35010.300781 35010.300781 0
2021-06-30 35001.898438 35214.898438 34730.449219 34772.199219 34772.199219 0
2021-07-01 34866.000000 34917.648438 34650.949219 34684.000000 34684.000000 164000
2021-07-02 34728.101562 34894.449219 34632.601562 34809.898438 34809.898438 165600
2021-07-05 35010.949219 35234.300781 34926.398438 35212.000000 35212.000000 0
2021-07-06 35173.601562 35807.449219 35165.550781 35579.148438 35579.148438 0
2021-07-07 35550.601562 35795.750000 35427.648438 35771.300781 35771.300781 0
2021-07-08 35603.250000 35811.000000 35134.648438 35274.101562 35274.101562 0
2021-07-09 35163.750000 35225.199219 34859.898438 35071.949219 35071.949219 0
Please let me know if this still happens with ver 0.1.63
Still, this problem persists as I tried to load at interval ='1m', Period='1d' for 'EURUSD=x' and end up getting just 32 rows. However, this one should load at least above 900rows. Version:yfinance 0.1.63 Fix it soon.
Please let me know if this still happens with ver 0.1.63
Hi @ranaroussi the duplicate entries (NaN) issue seems fixed. Thanks!
However, the backfilling of volumes for ^NSEI and missing volumes for NSEBANK still continues. Refer below where Friday volumes are backfilled on Monday.
>>> yf.download('^NSEI', period='5d', progress=False)
Open High Low Close Adj Close Volume
Date
2021-07-06 15813.750000 15914.200195 15801.000000 15818.250000 15818.250000 391400
2021-07-07 15819.599609 15893.549805 15779.700195 15879.650391 15879.650391 329300
2021-07-08 15855.400391 15885.750000 15682.900391 15727.900391 15727.900391 307900
2021-07-09 15688.250000 15730.849609 15632.750000 15689.799805 15689.799805 243200
2021-07-12 15766.799805 15789.200195 15644.750000 15692.599609 15692.599609 0
Can we have a solution to this volume backfilling issue?
Duplicate date entries issue fixed.
I can't think of any solution to the volume "backfilling" because this is problem in Yahoo's data, and all interval sizes are affected.
Is the backfilling still occurring?