whosonfirst icon indicating copy to clipboard operation
whosonfirst copied to clipboard

Store geometries for administrative areas

Open riordan opened this issue 9 years ago • 10 comments

First subset of #1: Storing geometries for admin areas in the document.

riordan avatar Jan 11 '16 16:01 riordan

Still discussing ES vs disk vs S3

riordan avatar Jan 13 '16 16:01 riordan

Current decision is to store in S3 and provide link in /place search.

trescube avatar Jan 14 '16 19:01 trescube

@thisisaaronland @heffergm: We're looking to re-serve the Who's on First geojson documents from S3 when someone requests the full details about a particular record that comes from Who's on First.

Today, all documents come from our Elasticsearch index on the /place endpoint, but we'd like to be able to pass along the complete, unadulterated WoF record, not just our version of it (also we'd rather store only the fields we use in ES).

What might that setup look like? What kind of implications would there be for folks looking to use Who's on First from their own setup if we do this?

riordan avatar Jan 14 '16 20:01 riordan

Are we talking about serving the WOF record as an API response, or are you talking about something as a convenience for people to go look at the original geojson?

I think you're going to end up having to reverse proxy the data from S3 via the API, it's all here:

https://s3.amazonaws.com/whosonfirst.mapzen.com/

cc @baldur

heffergm avatar Jan 14 '16 20:01 heffergm

Likely as an API response (since we didn't opt to build a hypermedia-style link structure into our v1), so we'll probably be reverse proxying.

I suppose if we're ok with non mapzenners requesting it directly from S3, (and I dont' imagine it'll happen often) then having them point at the S3 bucket could do the trick.

On Thu, Jan 14, 2016 at 3:08 PM, Grant Heffernan [email protected] wrote:

Are we talking about serving the WOF record as an API response, or are you talking about something as a convenience for people to go look at the original geojson?

I think you're going to end up having to reverse proxy the data from S3 via the API, it's all here:

https://s3.amazonaws.com/whosonfirst.mapzen.com/

cc @baldur https://github.com/baldur

— Reply to this email directly or view it on GitHub https://github.com/pelias/whosonfirst/issues/19#issuecomment-171765491.

David Riordan | Product Manager - Search | [email protected] | @riordan https://twitter.com/riordan | gpg 235D9DC95EF6277C https://keybase.io/riordan Mapzen | https://mapzen.com | @mapzen https://twitter.com/search

riordan avatar Jan 14 '16 20:01 riordan

I think we need to maintain our response format which means we can't just serve the exact WOF record as-is. We'll need to fetch it from S3 and copy the parts we care about into our response object.

The alternative to S3 access from API is storing the WOF data locally and reading it from disk when needed. This could be a pain at deploy time because we'd need to copy all of WOF to each API server.

dianashk avatar Jan 14 '16 21:01 dianashk

S3 access from inside AWS is quite fast... I'd be inclined to at least suggest starting there, rather than deal with local clones of a large dataset just to deploy the api.

heffergm avatar Jan 14 '16 22:01 heffergm

Yup, we're all on the same page then. The local clones of data would be plan B. Sounds like plan A is good, though. :raised_hands:

dianashk avatar Jan 14 '16 22:01 dianashk

Lets push this out of the milestone and approach it right afterwards.

riordan avatar Jan 20 '16 15:01 riordan

Hi everyone! A long overdue update here. For some time we considered this issue low priority, since the Mapzen Places API was serving Who's on First geometries already. Now of course Mapzen has shut down, so we should discuss serving geometries again.

Serving them only from the /v1/place endpoint still makes sense, as they can be quite large (100MB for New Zealand). My guess is they can be stored (as plain text) but not indexed in Elasticsearch via the whosonfirst importer. The geometries are probably 10s of GB for the whole world, so taking advantage of the scalability of Elasticsearch makes sense here.

The /v1/place endpoint would then query for Who's on First records directly by ID, as it does now, and efficiently return the geometry in the response.

orangejulius avatar Mar 05 '18 20:03 orangejulius