passport non-logical datasets shouldn't start with "is

non-logical datasets shouldn't start with "is_"

Open alistaire47 opened this issue 6 years ago • 0 comments

Currently is_developed and is_independent return character vectors:

unique(passport:::countries$is_developed)
#> [1] NA           "Developed"  "Developing"

unique(passport:::countries$is_independent)
#>  [1] NA                       "Yes"                   
#>  [3] "Territory of GB"        "International"         
#>  [5] "Territory of US"        "Part of NL"            
#>  [7] "Part of FI"             "Part of FR"            
#>  [9] "Territory of NO"        "Territory of AU"       
#> [11] "Associated with NZ"     "In contention"         
#> [13] "Part of DK"             "Crown dependency of GB"
#> [15] "Part of CN"             "Commonwealth of US"    
#> [17] "Territory of FR"        "Territory of NZ"       
#> [19] "Territories of US"

If they're going to start with is_, they should really return logical vectors. To address the issue, they could

drop information to actually return a logical
get renamed
be split in two, e.g. is_independent and dependency_status

None of these options is really ideal, as the expectation of as_country_code and as_country_name is usually to return a character vector or factor. They are not the only exceptions:

code_types <- sapply(passport:::countries, typeof) 

code_types[code_types != 'character']
#>                        gaul              un_region_code 
#>                    "double"                   "integer" 
#>           un_subregion_code un_intermediate_region_code 
#>                   "integer"                   "integer" 
#>                         m49                         ldc 
#>                   "integer"                   "logical" 
#>                        lldc                        sids 
#>                   "logical"                   "logical"

Numeric country codes (gaul, un_*_code, m49) are a different issue. Perhaps they should be strings, as they should not be operated upon, but converting them to factors is potentially very confusing and may merit a warning or message.

Country groupings (ldc, lldc, sids, un_*_code) will be addressed by #1 (though they face the same type issue).

These two (plus a lot more) should be split into a separate set of country attributes (#3), but the issue will still have to be addressed within that dataset.

This will be a breaking change, but integrating the change with #3 will minimize disruption.

Jul 12 '17 19:07 alistaire47

passport passport copied to clipboard

non-logical datasets shouldn't start with "is_"

passport
passport copied to clipboard