WDI icon indicating copy to clipboard operation
WDI copied to clipboard

Duplicated values for many developing countries

Open Macosso opened this issue 2 years ago • 3 comments

When attempting to extract data, the data comes duplicated for most developing countries.

For instance in the code below, Angola, Mozambique, and Congo Republic come duplicated. with instances containing ISO2 and others ISO3

dat <- WDI(indicator=c("DT.ODA.ODAT.GN.ZS","NY.GDP.MKTP.CD","DT.ODA.ODAT.GD.ZS",
                       "DT.ODA.ALLD.GD.ZS","DT.ODA.ODAT.XP.ZS","DT.ODA.ALLD.XP.ZS","NY.GNP.MKTP.CD"),
           country = c("AO","CG","SA","MZ"), start=1980, end=2020)

but it seems like some indicators are assigned to different ISO codes as the code below does not return duplicates for any of those countries.

dat <- WDI(indicator=c("DT.ODA.ODAT.GN.ZS"),
           country = c("AGO","CG","SA","MZ"), start=1980, end=2020)

Macosso avatar Oct 24 '21 13:10 Macosso

Thanks for the report. Does one of those indicators return 3-letter codes when you call the WDI function with just that one indicator?

If so, this would indicate that there is a problem in the World Bank API, rather than in the WDI package (which is what I suspect, actually).

vincentarelbundock avatar Oct 24 '21 20:10 vincentarelbundock

Hi, Yes many of these indicators show iso2 and some show iso3. some indicators showing iso3 "DT.ODA.ODAT.GD.ZS" & "DT.ODA.ALLD.GD.ZS". I believe there are many more but I did not check all of them.

Macosso avatar Oct 28 '21 06:10 Macosso

Diagnostic notes:

This code line in WDI assigns the first field of country (named "id") to "iso2c".

In this query, id is a "iso3c", and the countryiso3code field is empty:

https://api.worldbank.org/v2/en/country/AGO;COG;SAU;MOZ/indicator/DT.ODA.ODAT.GD.ZS?format=json

In this query, id is a "iso2c", and the countryiso3code field is filled with a valid "iso3c" code:

https://api.worldbank.org/v2/en/country/AO;CG;SA;MZ/indicator/DT.ODA.ODAT.GN.ZS?format=json

vincentarelbundock avatar Nov 01 '21 16:11 vincentarelbundock

Hi @vincentarelbundock, I think you can fix this by putting the following lines after defining dat2 in wdi.dl():

if (unique(nchar(dat2$iso2c)) == 3) {
      dat2$iso3c <- dat2$iso2c
      dat2$iso2c <- NULL
}

The example above then returns the expected output and it passes the tests

etiennebacher avatar Nov 29 '22 15:11 etiennebacher

@etiennebacher this makes a lot of sense. Maybe make some allowance for NAs and for the possibility that a single vector may include both 2 and 3-letter entries, in which case a simple == comparison would produce a boolean vector and break. Probably just wrapping in isTRUE would do the trick.

I'm traveling and won't be able to look into this seriously in the next 2-3 weeks, but I could merge a PR if anyone is so inclined.

Thanks all for the discussion and proposal.

vincentarelbundock avatar Nov 29 '22 16:11 vincentarelbundock

This should now be fixed in the Github version of the package. Thanks to @etiennebacher for the fix!!

(Note that this is still kind of a hack, because the actual problem is upstream in the data source.)

vincentarelbundock avatar Dec 01 '22 21:12 vincentarelbundock