GeoInterface.jl
GeoInterface.jl copied to clipboard
Clarify contracts for writers and readers
In general, users can expect writers (GeoJSON.write, Shapefile.write, GeoParquet.write, GeoDataFrames.write, etc) to accept either feature collections or table-like objects, as *.write(filename, obj). This is not always clear in the READMEs or package docs/docstrings.
This should be clarified in every README for these packages and in the docs as well, if those exist.
Similarly, readers should return a table-like object, possibly a feature collection. At least it should have some metadata. DataAPI.metadata functions should be included so we can propagate crs and geometry column info through e.g. DataFrame(GeoJSON.read(...)).
This is a good first issue to solve since it's mostly documentation modification!
I think making them all as much as possible identical is really important too. Like the same keywords, args etc wherever we can.
Even the docs should be basically the same text for the same functionality.
Much of this is now done, e.g. we have a section in the docs that discusses the metadata keys. Methods are also mostly consistent throughout the ecosystem.
Would be good to have a list somewhere to fully understand the "mostly"part
Shapefile only has the force keyword:
https://github.com/JuliaGeo/Shapefile.jl/blob/9d286a139d74607d45b27cb30a72c7c28c33377e/src/writer.jl#L100
GeoJSON has geometrycolumn but no force keyword:
https://github.com/JuliaGeo/GeoJSON.jl/blob/6aefd67e8a187a8cb447aa82a624f712078732fa/src/io.jl#L54-L55|
FlatGeoBuf.jl has no write support at all
GeoParquet has bbox as a vector argument, (should have extent instead? )geometrycolumn is not yet merged, and crs is also an argument not a keyword?
https://github.com/JuliaGeo/GeoParquet.jl/blob/42e85f47570559286aa5b04ffe86c68cc34fb4ad/src/io.jl#L12
GeoArrow has geocolumns, and crs as a keyword (nice) and encoding which is arrow specific.
https://github.com/JuliaGeo/GeoArrow.jl/blob/2fab9cbd27d8e7c1c1a5aed43897e37daf090730/src/io.jl#L7
I'm hoping we have:
Package.write(filename/io, table/geometry; geometrycolumn, crs, force) for all of them (at least that accept crs), and extra keywords where needed.
Probably GeoJSON.jl should even accept crs and just give a warning that its deprecated and must be in lat/lon.
GeoDataframes has a lot more, with crs and geometrycolumn but not force, I'm assuming that means you can just overwrite files @evetion ? Maybe thats better, idk (terra has overwrite=false like Rasters and Shapefile have force=false by default)
https://github.com/evetion/GeoDataFrames.jl/blob/2a29bd7526d65d994aa82c008e59b016ee78fa0b/src/io.jl#L155-L164
We should probably agree on this exact signature is before making any more PRs on it so its just one change and efficient to roll out.
read seems in all cases less complicated and the keywords are package specific. But not every package has read. Probably they all should? we can just do read(args...; kw...) = Table(args...; kw...) where needed?
I would personally rather have force=true as a default or somehow configurable by preferences. But I can see the argument. Maybe put it to a poll internally where you work and see what folks think?
Yeah, I don't like the possibility of accidentally overwriting files, I think you should always have to type something strongly worded like force=true to show you mean it
Yeah, we have to make a decision though, config is definitely out.
Adding force=false would be a breaking change right now, in that the function will randomly fail if you had written code without this in mind. But how many people would actually write such code? Probably a lot, at least in analysis workflows where you re-run the analysis and save to file. So it's worth some amount of consideration, and a very nice error message..
So, here is a a todo list, assuming the above is generally accepted (edit if not!):
- [ ] Shapefile.jl:
- [ ]
crskeyword, - [ ]
geometrycolumnkeyword - [ ]
read(path::String) = Table(path)
- [ ]
- [ ] GeoJSON.jl:
- [ ]
forcekeyword - [ ]
crswith a warning for consistency
- [ ]
- [ ] GeoParquet.jl:
- [ ]
geometrycolumnkeyword in https://github.com/JuliaGeo/GeoParquet.jl/pull/30 - [ ]
forcekeyword - [ ]
crskeyword should accept any CRS type orString, and call convert on it - [ ] Maybe we should also remove the
read(driver, fn)readand just useread(fn; driver), and_readhandles driver dispatch.
- [ ]
- [ ] GeoArrow.jl:
- [ ]
forcekeyword inwrite, - [ ]
geocolumn=>geometrycolumn(still letgeocolumnwork but undocumented) - [x]
readis already good.
- [ ]
- [ ] GeoDataframes.jl
force=truekeyword ?
@evetion of course looking for your input here! Does this seem like a reasonable plan?
Adding force=false would be a breaking change right now
For sure. Making a very solid agreement here and doing it all at once, or even with a warning first for six months would be the way to go.
The warning could be:
isfile(path) && force == false && @warn "use `force=true` to overwrite files. In future versions this will be an error"
Yeah that sounds good, let's also give that a maxlog=15 for folks who don't have direct control over code.
I reformatted your list into smaller chunks so it's easier to do 5 minute PRs for
I would prefer they were single PRs each package to keep the review overhead a bit lower
We can do that too! But in the forlorn hope that we invite a random passer by....
Oh yeah randoms can totally do any one of those :)