rgugik icon indicating copy to clipboard operation
rgugik copied to clipboard

Output unification from geocode

Open kadyb opened this issue 5 years ago • 5 comments

Depending on the given object, there are returned lists with different attributes as a result. It would look better if it was unified. Note, a single list or multiple lists (if there are n of the same objects in different locations) can be returned.

library("rgugik")

# place
lapply(geocode(address = "Marki")[1], names)
#> $`1`
#> [1] "city"         "teryt"        "simc"         "voivodeship" 
#> [5] "county"       "commune"      "x"            "y"           
#> [9] "geometry_wkt" "accuracy"     "id"           "jednostka" 

# place and street
names(geocode(address = "Marki, Andersa")) # IT'S LINE, NOT POINT
#> [1] "street"       "teryt"        "simc"         "ulic"        
#> [5] "city"         "x"            "y"            "geometry_wkt"
#> [9] "accuracy"     "id"           "jednostka"   

# place, street and house number
names(geocode(address = "Marki, Andersa 1"))
#> [1] "city"            "citypart"        "street"         
#> [4] "number"          "teryt"           "simc"           
#> [7] "ulic"            "code"            "jednostka"      
#> [10] "x"               "y"               "geometry_wkt"   
#> [13] "accuracy"        "city_accuracy"   "street_accuracy"
#> [16] "id"   

# place and house number
names(geocode(address = "Królewskie Brzeziny 13"))
#> [1] "city"          "citypart"      "street"       
#> [4] "number"        "teryt"         "simc"         
#> [7] "ulic"          "code"          "jednostka"    
#> [10] "x"             "y"             "geometry_wkt" 
#> [13] "accuracy"      "city_accuracy" "id"  

# physiographic object 1
names(geocode(geoname = "Las Mierzei"))
#> [1] "name"         "genitive"     "voivodeship"  "county"      
#> [5] "commune"      "class"        "type"         "status"      
#> [9] "geometry_wkt" "x"            "y"            "accuracy"  

# physiographic object 2
lapply(geocode(geoname = "Jeziorak")[1], names)
#> [1] "name"         "genitive"     "voivodeship"  "county"      
#> [5] "commune"      "class"        "type"         "status"      
#> [9] "notes"        "geometry_wkt" "x"            "y"           
#> [13] "accuracy"

In case "road" or "rail crossing", output is OK.

kadyb avatar Aug 09 '20 17:08 kadyb

@kadyb I can think of two solutions:

  1. Find one set of common variables (I would guess ~20 based on your examples above). Create an object template with these ~20 variables (columns) and next fill them with downloaded data. In case when some query does not return a variable - the variable stays NA.
  2. Split possible results into several groups (as above). Treat each group independently.

What do you think about this issue?

Nowosad avatar Aug 16 '20 16:08 Nowosad

To information: We currently have a lot of duplicate code, so we should create some helper function.

kadyb avatar Sep 10 '20 13:09 kadyb

BTW. is it possible that geocodePL_get() may return sf object instead of list? That would be super useful for speeding up the processing and merging with other data?

Fixed in https://github.com/kadyb/rgugik/pull/43.

kadyb avatar Oct 28 '20 13:10 kadyb

Concerning this issue, maybe we could leave this for the user? We do not know which columns they will use or find useful for their analysis.

BERENZ avatar Oct 31 '20 10:10 BERENZ

I think we can safely remove the meaningless columns: x, y, id, jednostka, accuracy, city_accuracy, street_accuracy

kadyb avatar Oct 31 '20 10:10 kadyb