pandas-datareader icon indicating copy to clipboard operation
pandas-datareader copied to clipboard

World Bank data is not retrieved one country x year per row

Open ozak opened this issue 6 years ago • 0 comments

When downloading data from the WB it seems the downloader generates multiple rows per country, even if it only one year is requested. I wonder if this is due to indicators coming from different sources? E.g.

wb.download(indicator=['SP.POP.0014.FE.IN', 'SP.POP.0004.FE'], country=['COL'], start=2017, end=2017)

returns

              SP.POP.0014.FE.IN  SP.POP.0004.FE
country  year                                   
Colombia 2017          5631271.0             NaN
         2017                NaN       1814549.0

requiring a groupby(['country', 'year']).max() to get the correct dataset

 wb.download(indicator=['SP.POP.0014.FE.IN', 'SP.POP.0004.FE'], country=['COL'], start=2017, end=2017).groupby(['country', 'year']).max()
Out[54]: 
               SP.POP.0014.FE.IN  SP.POP.0004.FE
country  year                                   
Colombia 2017          5631271.0       1814549.0

ozak avatar Oct 08 '19 21:10 ozak