eurostat
eurostat copied to clipboard
Duplicated variable columns in geospatial data
Currently different geospatial datasets have the following columns:
year / variable | 2003 | 2006 | 2010 | 2013 | 2016 | 2021 |
---|---|---|---|---|---|---|
id | x | x | x | x | x | x |
LEVL_CODE | x | x | x | x | x | x |
NUTS_ID | x | x | x | x | x | x |
CNTR_CODE | x | x | x | x | x | x |
NAME_LATN | x | x | x | x | x | |
NUTS_NAME | x | x | x | x | x | x |
MOUNT_TYPE | x | x | ||||
URBN_TYPE | x | x | ||||
COAST_TYPE | x | x | ||||
FID | x | x | x | x | x | x |
geometry | x | x | x | x | x | x |
geo | x | x | x | x | x | x |
Of these, at least in years 2016 and 2021, the following variables contain identical information: id
, NUTS_ID
, FID
and geo
. The id
column is the unique identifier from geojson and not included in the csv file. The geo
column is generated at the end of get_eurostat_geospatial
"for easier joins with dplyr", as well as in data generation script data_spatial.R
.
While some of this overlap is due to eurostat data itself containing duplicated columns, is geo
column still necessary?
If this can be easily retrieved otherwise when needed (example, maybe?) then I guess it could also removed.
Addressed partly in v4-dev branch and PR #264. geo
column is now marked in get_eurostat_geospatial
function documentation as "Questioning", offering us some more time to discuss whether we should remove it or keep it in the future.
I will close this issue when v4 is released