RoadDetections
RoadDetections copied to clipboard
Point Geometry in USA data
The USA data here has the following issue:
- Unlike all other regions' data, of the 54,484,737 GeoJSON entries in the USA data 204,789 are invalid as they consist of single-value zero-length LineString rather than Point geometry elements which causes issues when trying to process the files
There are then the following inconsistencies when compared to the other region data files
- The region file name is USA.zip where all other region files are called <region>-Full.zip, for example the East Africa region is AfricaEast-Full.zip
- The file-name in the zip-archive is _USA.tsv where all other region files are called <region>-Full.tsv, for example the East Africa region is AfricaEast-Full.tsv
- Unlike all other regions' data, the first column in the USA region TSV file does not contain a three-alpha country code, for example GBR for Great Britain
Outwith this, thank you for making this excellent data set available
There are also records that contain identical geom
values to other records and LINESTRINGs with 2 or more points where all points are in the exact same spot. BigQuery rejects the single unique point records when I try to load them in.
There are at least 270,293 records that need to be excluded before the dataset can be loaded into BigQuery. I say at least because I still haven't managed to filter out every record that BigQuery doesn't like yet.
I reduced the precision of each point to six decimal places, made sure each point in each LINESTRING was unique, made sure every LINESTRING had at least two points and de-duplicated every LINESTRING. This filtered out 495,727 records and I've been able to load the remaining records into BigQuery without issue.
@anisotropi4 , @marklit I will look into this issue and if I confirm it, will fix and reupload a a file