yfinance icon indicating copy to clipboard operation
yfinance copied to clipboard

Duplicate Entries & Missing Volumes

Open kannansingaravelu opened this issue 4 years ago • 5 comments

A) There are duplicate entries when downloading prices using yf.download. This seems to be a perennial issue and happen almost every day (for many tickers). The below code is retrieved for Dr.Reddy and HindPetro where there is a duplicate entries on 2021-07-09.

yf.download('DRREDDY.NS', period='5d', progress=False)

                   Open         High          Low        Close    Adj Close    Volume
Date                                                                                 
2021-07-05  5576.200195  5596.000000  5531.950195  5537.899902  5512.574707  297341.0
2021-07-06  5539.799805  5561.200195  5514.000000  5539.450195  5514.117676  178108.0
2021-07-07  5529.000000  5614.600098  5515.049805  5562.100098  5536.664062  538508.0
2021-07-08  5540.000000  5588.000000  5449.600098  5466.750000  5441.750000  451167.0
2021-07-09          NaN          NaN          NaN          NaN          NaN       NaN
2021-07-09  5448.000000  5520.250000  5416.950195  5460.299805  5460.299805  460617.0

>>> yf.download('HINDPETRO.NS', period='5d', progress=False)

                  Open        High         Low       Close   Adj Close     Volume
Date                                                                             
2021-07-05  301.399994  305.799988  298.500000  304.450012  259.252106  4835974.0
2021-07-06  306.500000  309.450012  303.200012  304.549988  259.337250  6501684.0
2021-07-07  306.250000  307.100006  299.500000  306.350006  260.870026  6552189.0
2021-07-08  285.000000  289.399994  283.049988  283.850006  261.100006  9236777.0
2021-07-09         NaN         NaN         NaN         NaN         NaN        NaN
2021-07-09  284.100006  285.000000  278.049988  278.950012  278.950012  2925669.0

How do we fix this? Are there any permanent fix?

B) The volumes for indices are backfilled and not printed point-in-time. In few instance, it is sparse. For example for ^NSEI the volumes were 0 for 2021-07-09, but these are getting backfilled after 2/3 days. And, for ^NSEBANK, it is sparse. How do we address this inconsistency?

>>> yf.download('^NSEI', period='5d', progress=False)

                    Open          High           Low         Close     Adj Close  Volume
Date                                                                                    
2021-07-05  15793.400391  15845.950195  15762.049805  15834.349609  15834.349609  207000
2021-07-06  15813.750000  15914.200195  15801.000000  15818.250000  15818.250000  391400
2021-07-07  15819.599609  15893.549805  15779.700195  15879.650391  15879.650391  329300
2021-07-08  15855.400391  15885.750000  15682.900391  15727.900391  15727.900391  307900
2021-07-09  15688.250000  15730.849609  15632.750000  15689.799805  15689.799805       0

>>> yf.download('^NSEBANK', period='10d', progress=False)

                    Open          High           Low         Close     Adj Close  Volume
Date                                                                                    
2021-06-28  35488.199219  35576.949219  35236.449219  35359.449219  35359.449219       0
2021-06-29  35320.449219  35337.449219  34913.699219  35010.300781  35010.300781       0
2021-06-30  35001.898438  35214.898438  34730.449219  34772.199219  34772.199219       0
2021-07-01  34866.000000  34917.648438  34650.949219  34684.000000  34684.000000  164000
2021-07-02  34728.101562  34894.449219  34632.601562  34809.898438  34809.898438  165600
2021-07-05  35010.949219  35234.300781  34926.398438  35212.000000  35212.000000       0
2021-07-06  35173.601562  35807.449219  35165.550781  35579.148438  35579.148438       0
2021-07-07  35550.601562  35795.750000  35427.648438  35771.300781  35771.300781       0
2021-07-08  35603.250000  35811.000000  35134.648438  35274.101562  35274.101562       0
2021-07-09  35163.750000  35225.199219  34859.898438  35071.949219  35071.949219       0

kannansingaravelu avatar Jul 09 '21 20:07 kannansingaravelu

Please let me know if this still happens with ver 0.1.63

ranaroussi avatar Jul 10 '21 20:07 ranaroussi

Still, this problem persists as I tried to load at interval ='1m', Period='1d' for 'EURUSD=x' and end up getting just 32 rows. However, this one should load at least above 900rows. Version:yfinance 0.1.63 Fix it soon.

akashva50 avatar Jul 12 '21 17:07 akashva50

Please let me know if this still happens with ver 0.1.63

Hi @ranaroussi the duplicate entries (NaN) issue seems fixed. Thanks!

However, the backfilling of volumes for ^NSEI and missing volumes for NSEBANK still continues. Refer below where Friday volumes are backfilled on Monday.

>>> yf.download('^NSEI', period='5d', progress=False)

Open	High	Low	Close	Adj Close	Volume
Date						
2021-07-06	15813.750000	15914.200195	15801.000000	15818.250000	15818.250000	391400
2021-07-07	15819.599609	15893.549805	15779.700195	15879.650391	15879.650391	329300
2021-07-08	15855.400391	15885.750000	15682.900391	15727.900391	15727.900391	307900
2021-07-09	15688.250000	15730.849609	15632.750000	15689.799805	15689.799805	243200
2021-07-12	15766.799805	15789.200195	15644.750000	15692.599609	15692.599609	0

Can we have a solution to this volume backfilling issue?

kannansingaravelu avatar Jul 12 '21 19:07 kannansingaravelu

Duplicate date entries issue fixed.

I can't think of any solution to the volume "backfilling" because this is problem in Yahoo's data, and all interval sizes are affected.

ValueRaider avatar Jan 06 '23 18:01 ValueRaider

Is the backfilling still occurring?

ValueRaider avatar Mar 14 '24 19:03 ValueRaider