GeoInterface.jl icon indicating copy to clipboard operation
GeoInterface.jl copied to clipboard

Clarify contracts for writers and readers

Open asinghvi17 opened this issue 1 year ago • 1 comments

In general, users can expect writers (GeoJSON.write, Shapefile.write, GeoParquet.write, GeoDataFrames.write, etc) to accept either feature collections or table-like objects, as *.write(filename, obj). This is not always clear in the READMEs or package docs/docstrings.

This should be clarified in every README for these packages and in the docs as well, if those exist.

Similarly, readers should return a table-like object, possibly a feature collection. At least it should have some metadata. DataAPI.metadata functions should be included so we can propagate crs and geometry column info through e.g. DataFrame(GeoJSON.read(...)).

This is a good first issue to solve since it's mostly documentation modification!

asinghvi17 avatar Jun 16 '24 16:06 asinghvi17

I think making them all as much as possible identical is really important too. Like the same keywords, args etc wherever we can.

Even the docs should be basically the same text for the same functionality.

rafaqz avatar Jun 16 '24 16:06 rafaqz

Much of this is now done, e.g. we have a section in the docs that discusses the metadata keys. Methods are also mostly consistent throughout the ecosystem.

asinghvi17 avatar May 15 '25 01:05 asinghvi17

Would be good to have a list somewhere to fully understand the "mostly"part

rafaqz avatar May 15 '25 06:05 rafaqz

Shapefile only has the force keyword: https://github.com/JuliaGeo/Shapefile.jl/blob/9d286a139d74607d45b27cb30a72c7c28c33377e/src/writer.jl#L100

GeoJSON has geometrycolumn but no force keyword: https://github.com/JuliaGeo/GeoJSON.jl/blob/6aefd67e8a187a8cb447aa82a624f712078732fa/src/io.jl#L54-L55|

FlatGeoBuf.jl has no write support at all

GeoParquet has bbox as a vector argument, (should have extent instead? )geometrycolumn is not yet merged, and crs is also an argument not a keyword? https://github.com/JuliaGeo/GeoParquet.jl/blob/42e85f47570559286aa5b04ffe86c68cc34fb4ad/src/io.jl#L12

GeoArrow has geocolumns, and crs as a keyword (nice) and encoding which is arrow specific. https://github.com/JuliaGeo/GeoArrow.jl/blob/2fab9cbd27d8e7c1c1a5aed43897e37daf090730/src/io.jl#L7

I'm hoping we have:

Package.write(filename/io, table/geometry; geometrycolumn, crs, force) for all of them (at least that accept crs), and extra keywords where needed.

Probably GeoJSON.jl should even accept crs and just give a warning that its deprecated and must be in lat/lon.

rafaqz avatar May 15 '25 20:05 rafaqz

GeoDataframes has a lot more, with crs and geometrycolumn but not force, I'm assuming that means you can just overwrite files @evetion ? Maybe thats better, idk (terra has overwrite=false like Rasters and Shapefile have force=false by default)

https://github.com/evetion/GeoDataFrames.jl/blob/2a29bd7526d65d994aa82c008e59b016ee78fa0b/src/io.jl#L155-L164

We should probably agree on this exact signature is before making any more PRs on it so its just one change and efficient to roll out.

read seems in all cases less complicated and the keywords are package specific. But not every package has read. Probably they all should? we can just do read(args...; kw...) = Table(args...; kw...) where needed?

rafaqz avatar May 15 '25 21:05 rafaqz

I would personally rather have force=true as a default or somehow configurable by preferences. But I can see the argument. Maybe put it to a poll internally where you work and see what folks think?

asinghvi17 avatar May 15 '25 21:05 asinghvi17

Yeah, I don't like the possibility of accidentally overwriting files, I think you should always have to type something strongly worded like force=true to show you mean it

rafaqz avatar May 15 '25 21:05 rafaqz

Yeah, we have to make a decision though, config is definitely out.

Adding force=false would be a breaking change right now, in that the function will randomly fail if you had written code without this in mind. But how many people would actually write such code? Probably a lot, at least in analysis workflows where you re-run the analysis and save to file. So it's worth some amount of consideration, and a very nice error message..

asinghvi17 avatar May 15 '25 21:05 asinghvi17

So, here is a a todo list, assuming the above is generally accepted (edit if not!):

  • [ ] Shapefile.jl:
    • [ ] crs keyword,
    • [ ] geometrycolumn keyword
    • [ ] read(path::String) = Table(path)
  • [ ] GeoJSON.jl:
    • [ ] force keyword
    • [ ] crs with a warning for consistency
  • [ ] GeoParquet.jl:
    • [ ] geometrycolumn keyword in https://github.com/JuliaGeo/GeoParquet.jl/pull/30
    • [ ] force keyword
    • [ ] crs keyword should accept any CRS type or String, and call convert on it
    • [ ] Maybe we should also remove the read(driver, fn) read and just use read(fn; driver), and _read handles driver dispatch.
  • [ ] GeoArrow.jl:
    • [ ] force keyword in write,
    • [ ] geocolumn => geometrycolumn (still let geocolumn work but undocumented)
    • [x] read is already good.
  • [ ] GeoDataframes.jl force=true keyword ?

@evetion of course looking for your input here! Does this seem like a reasonable plan?

rafaqz avatar May 15 '25 21:05 rafaqz

Adding force=false would be a breaking change right now

For sure. Making a very solid agreement here and doing it all at once, or even with a warning first for six months would be the way to go.

The warning could be:

isfile(path) && force == false && @warn "use `force=true` to overwrite files. In future versions this will be an error"

rafaqz avatar May 15 '25 21:05 rafaqz

Yeah that sounds good, let's also give that a maxlog=15 for folks who don't have direct control over code.

I reformatted your list into smaller chunks so it's easier to do 5 minute PRs for

asinghvi17 avatar May 15 '25 21:05 asinghvi17

I would prefer they were single PRs each package to keep the review overhead a bit lower

rafaqz avatar May 15 '25 21:05 rafaqz

We can do that too! But in the forlorn hope that we invite a random passer by....

asinghvi17 avatar May 15 '25 21:05 asinghvi17

Oh yeah randoms can totally do any one of those :)

rafaqz avatar May 15 '25 21:05 rafaqz