schema
schema copied to clipboard
Remove `source_id` field
Each of our Elasticsearch documents stores the original source ID from the upstream data in two places: as part of the GID in the _id
field, and in the source_id
field.
Conservatively estimating 10 bytes per record, this probably represents about 6GB of duplicated data in a full planet build, and thus it would make sense to get rid of it when possible.
There will be changes required to the API, pelias/model, and here in pelias/schema to do this.
We'll also have to ensure reasonable backwards compatibility is maintained. Especially in the API, where the source_id
field in our GeoJSON responses will still be required.