schema icon indicating copy to clipboard operation
schema copied to clipboard

Remove `source_id` field

Open orangejulius opened this issue 5 years ago • 0 comments

Each of our Elasticsearch documents stores the original source ID from the upstream data in two places: as part of the GID in the _id field, and in the source_id field.

Conservatively estimating 10 bytes per record, this probably represents about 6GB of duplicated data in a full planet build, and thus it would make sense to get rid of it when possible.

There will be changes required to the API, pelias/model, and here in pelias/schema to do this.

We'll also have to ensure reasonable backwards compatibility is maintained. Especially in the API, where the source_id field in our GeoJSON responses will still be required.

orangejulius avatar Oct 04 '19 19:10 orangejulius