duckdb_spatial
duckdb_spatial copied to clipboard
H3 values modified when exported as JSON but not as CSV
I've compiled the H3 extension for DuckDB and loaded it into the DuckDB binary with your extension.
$ git clone https://github.com/isaacbrodsky/h3-duckdb ./duckdb_h3
$ cd ./duckdb_h3
$ git submodule update --init
$ GEN=ninja \
make release
$ /Volumes/Seagate/duckdb_spatial/build/release/duckdb -unsigned aus1.duckdb
install '/Volumes/Seagate/duckdb_h3/build/release/h3.duckdb_extension';
load h3;
load json;
I'm able to pull an example record where the H3_7 value starts with 87be.
.mode line
SELECT geom,
st_centroid(geom) centroid,
st_x(st_centroid(geom)) x,
st_y(st_centroid(geom)) y,
printf('%x', h3_latlng_to_cell(
ST_Y(ST_CENTROID(geom)),
ST_X(ST_CENTROID(geom)),
7)::bigint) h3_7_yx
from roads2
limit 1;
geom = LINESTRING (147.084939479828 -30.7773677126431, 147.0849609375 -30.7773584948334, 147.084971666336 -30.7773400592114)
centroid = POINT (147.08495789089685 -30.777356503454)
x = 147.08495789089685
y = -30.777356503454
h3_7_yx = 87beee8a3ffffff
87beee8a3ffffff is in Australia as I expected. https://what-the-h3index.vercel.app/
I then exported that table with the aggregate counts for each H3_7.
COPY (
SELECT printf('%x',
h3_latlng_to_cell(
ST_Y(ST_CENTROID(geom)),
ST_X(ST_CENTROID(geom)),
7)::bigint) as h3_7,
count(*) count
FROM roads2
GROUP BY 1
) TO 'aus_h3.json';
None of the exported records start with 87be, for some reason only H3_7s prefixed with 87[9ac] are exporting and none of these are anywhere near Australia.
$ jq -S . aus_h3.json | grep h3 | cut -c1-15 | sort | uniq -c
151 "h3_7": "8798
469 "h3_7": "8799
841 "h3_7": "87a4
144 "h3_7": "87a5
520 "h3_7": "87c4
357 "h3_7": "87c5
$ head aus_h3.json
{"h3_7":"87c529c10ffffff","count":6287}
{"h3_7":"87a4c5d6effffff","count":2441}
{"h3_7":"87c431431ffffff","count":19481}
{"h3_7":"87a4c0860ffffff","count":12708}
{"h3_7":"87a4de11cffffff","count":2375}
{"h3_7":"87c519272ffffff","count":4185}
{"h3_7":"87c52b8c4ffffff","count":5311}
{"h3_7":"87a4de95effffff","count":1665}
{"h3_7":"87c433b98ffffff","count":10999}
{"h3_7":"87c52b80dffffff","count":4654}
Does it work if you export the entire original subquery? The export itself doesnt reallt have any knowledge of what its exporting in this case since there is no geometry, so I’m hesitant to believe the json driver is to blame.
I'll walk through it end-to-end and see if I can reproduce it with a single record. Maybe I'm missing something.
I ran through the whole piece again and used zoom level 2 so there are fewer results to look at. I'm wondering if this is a degree to radiants conversion issue.
$ cat aus_h3.json
{"h3_2":"82c52ffffffffff","count":161}
{"h3_2":"82c437fffffffff","count":137}
{"h3_2":"82c50ffffffffff","count":218}
{"h3_2":"82c4affffffffff","count":32}
{"h3_2":"82a4c7fffffffff","count":108}
{"h3_2":"82c427fffffffff","count":81}
{"h3_2":"82a477fffffffff","count":42}
{"h3_2":"82a4dffffffffff","count":63}
{"h3_2":"82a457fffffffff","count":29}
{"h3_2":"82a467fffffffff","count":43}
{"h3_2":"82a4e7fffffffff","count":26}
{"h3_2":"82999ffffffffff","count":7}
{"h3_2":"8298b7fffffffff","count":15}
{"h3_2":"82a54ffffffffff","count":20}
{"h3_2":"82998ffffffffff","count":10}
{"h3_2":"82c51ffffffffff","count":5}
{"h3_2":"82c48ffffffffff","count":1}
{"h3_2":"82c417fffffffff","count":2}
I'm able to get the data out properly using CSV.
$ vi csv_to_json.py
import json
import sys
recs = [x.split(',')
for x in sys.stdin.read().strip().splitlines()]
print(json.dumps([{'count': cnt,
'hexIds': [h3]}
for h3, cnt in recs]))
$ vi h3.sql
.mode csv
LOAD h3;
LOAD json;
SELECT printf('%x',
h3_latlng_to_cell(
ST_Y(ST_CENTROID(geom)),
ST_X(ST_CENTROID(geom)),
2)::bigint) as h3_2,
count(*) count
FROM roads2
GROUP BY 1;
$ /Volumes/Seagate/duckdb_spatial/build/release/duckdb \
-unsigned \
aus.duckdb \
< h3.sql \
| python3 csv_to_json.py \
| tail +2 \
> aus_h3.duckgl.json
wondering if this is a type casting problem? are the JSON values being cast as character, integer, or double? CSV would not be impacted since unlike json it is not typed. do you see this issue in parquet too?