RoadDetections
RoadDetections copied to clipboard
Consider partitioning countries at the file level rather than marking countries in a TSV row
Oceania-Full.zip is 282 MB at the moment. If its GeoJSON file was partitioned by country and sorted the ZIP file would be 244 MB instead. This would allow people to download the ZIP file faster. They would also use less space picking out the countries they're interested in. The GeoJSON would open right away in QGIS and other GIS software without first needing to ETL the TSV.
$ vi a.sh
sort AUS.geojson > AUS.sorted.geojson
sort NZL.geojson > NZL.sorted.geojson
sort PNG.geojson > PNG.sorted.geojson
sort VUT.geojson > VUT.sorted.geojson
sort FJI.geojson > FJI.sorted.geojson
sort SLB.geojson > SLB.sorted.geojson
sort TON.geojson > TON.sorted.geojson
sort WSM.geojson > WSM.sorted.geojson
sort FSM.geojson > FSM.sorted.geojson
sort KIR.geojson > KIR.sorted.geojson
sort PLW.geojson > PLW.sorted.geojson
sort MHL.geojson > MHL.sorted.geojson
sort TUV.geojson > TUV.sorted.geojson
sort NRU.geojson > NRU.sorted.geojson
$ cat a.sh | xargs -n1 -P4 -I% bash -xc '%'
$ zip -9 Oceania.sorted.zip \
AUS.sorted.geojson \
NZL.sorted.geojson \
PNG.sorted.geojson \
VUT.sorted.geojson \
FJI.sorted.geojson \
SLB.sorted.geojson \
TON.sorted.geojson \
WSM.sorted.geojson \
FSM.sorted.geojson \
KIR.sorted.geojson \
PLW.sorted.geojson \
MHL.sorted.geojson \
TUV.sorted.geojson \
NRU.sorted.geojson
$ unzip -l Oceania.sorted.zip
Archive: Oceania.sorted.zip
Length Date Time Name
--------- ---------- ----- ----
1071521607 2023-04-10 18:58 AUS.sorted.geojson
185466598 2023-04-10 18:57 NZL.sorted.geojson
28007237 2023-04-10 18:57 PNG.sorted.geojson
6470562 2023-04-10 18:57 VUT.sorted.geojson
5832797 2023-04-10 18:57 FJI.sorted.geojson
4423195 2023-04-10 18:57 SLB.sorted.geojson
1047604 2023-04-10 18:57 TON.sorted.geojson
1066450 2023-04-10 18:57 WSM.sorted.geojson
307308 2023-04-10 18:57 FSM.sorted.geojson
190892 2023-04-10 18:57 KIR.sorted.geojson
242639 2023-04-10 18:57 PLW.sorted.geojson
119872 2023-04-10 18:57 MHL.sorted.geojson
44300 2023-04-10 18:57 TUV.sorted.geojson
38006 2023-04-10 18:57 NRU.sorted.geojson
--------- -------
1304779067 14 files
$ unzip Oceania.sorted.zip NZL.sorted.geojson
For some of the largest datasets, like Canada and Japan, the 3-letter country identifier is redundant since every record in those ZIPs are for their respective countries.