hale
hale copied to clipboard
Add TopoJSON Writer
TopoJSON is an extension of GeoJSON that encodes topology for Linestring and Polygon data. Rather than representing geometries discretely, geometries in TopoJSON files are stitched together from shared line segments called arcs. TopoJSON is most effective for coverage-style datasets consisting of many polygons that touch eahc other, like administrative units or cadastral parcels.
In hale studio, a minimal implementation would be to implement non-shared arcs. A more advanced implementation would detect shared arcs and encode them once, and then reference them from any feature where they form part of the geometry.
There are several implementations of such tools available in Java or JS:
https://github.com/bouviervj/topojson-j https://github.com/topojson/topojson
This issue is created due to a customer project. Additional requirements and test data will be made available in an internal ticket.
@stempler would it also be possible/easier to implement this support for the new/existing JSON IO component of hale studio?
@thorsten-reitz An initial version of this was shipped with 5.0, I believe. Would you say that this can be closed?
@florianesser I haven't really tested that. Will do so now.
It doesn't seem to be working well. Here is my testcase:
- Load the Basic Hydrography Example
- Export the transformed data
- Inspect the data in an editor
What I see is that all the non-spatial string-type attributes are incorrectly encoded. Here is an example:
"identifier":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"
@thorsten-reitz Yes, that's one of the limitations of the current implementation (see related commit message).
@florianesser ah OK, that led to the dataset being hugely inflated (about 80% of the whole file were such markers) and even being unreadable in notepad++. TBH I would consider that to be a blocker for closing this ticket. Empty fields are quite common.
@thorsten-reitz We are grooming this issue. Do you have an example of non-spatial string attributes that are causing this issue?