WDI
WDI copied to clipboard
Duplicated values for many developing countries
When attempting to extract data, the data comes duplicated for most developing countries.
For instance in the code below, Angola, Mozambique, and Congo Republic come duplicated. with instances containing ISO2 and others ISO3
dat <- WDI(indicator=c("DT.ODA.ODAT.GN.ZS","NY.GDP.MKTP.CD","DT.ODA.ODAT.GD.ZS",
"DT.ODA.ALLD.GD.ZS","DT.ODA.ODAT.XP.ZS","DT.ODA.ALLD.XP.ZS","NY.GNP.MKTP.CD"),
country = c("AO","CG","SA","MZ"), start=1980, end=2020)
but it seems like some indicators are assigned to different ISO codes as the code below does not return duplicates for any of those countries.
dat <- WDI(indicator=c("DT.ODA.ODAT.GN.ZS"),
country = c("AGO","CG","SA","MZ"), start=1980, end=2020)
Thanks for the report. Does one of those indicators return 3-letter codes when you call the WDI
function with just that one indicator?
If so, this would indicate that there is a problem in the World Bank API, rather than in the WDI
package (which is what I suspect, actually).
Hi,
Yes many of these indicators show iso2 and some show iso3.
some indicators showing iso3 "DT.ODA.ODAT.GD.ZS"
& "DT.ODA.ALLD.GD.ZS"
. I believe there are many more but I did not check all of them.
Diagnostic notes:
This code line in WDI
assigns the first field of country
(named "id") to "iso2c".
In this query, id
is a "iso3c", and the countryiso3code
field is empty:
https://api.worldbank.org/v2/en/country/AGO;COG;SAU;MOZ/indicator/DT.ODA.ODAT.GD.ZS?format=json
In this query, id
is a "iso2c", and the countryiso3code
field is filled with a valid "iso3c" code:
https://api.worldbank.org/v2/en/country/AO;CG;SA;MZ/indicator/DT.ODA.ODAT.GN.ZS?format=json
Hi @vincentarelbundock, I think you can fix this by putting the following lines after defining dat2
in wdi.dl()
:
if (unique(nchar(dat2$iso2c)) == 3) {
dat2$iso3c <- dat2$iso2c
dat2$iso2c <- NULL
}
The example above then returns the expected output and it passes the tests
@etiennebacher this makes a lot of sense. Maybe make some allowance for NA
s and for the possibility that a single vector may include both 2 and 3-letter entries, in which case a simple ==
comparison would produce a boolean vector and break. Probably just wrapping in isTRUE
would do the trick.
I'm traveling and won't be able to look into this seriously in the next 2-3 weeks, but I could merge a PR if anyone is so inclined.
Thanks all for the discussion and proposal.
This should now be fixed in the Github version of the package. Thanks to @etiennebacher for the fix!!
(Note that this is still kind of a hack, because the actual problem is upstream in the data source.)