schema
schema copied to clipboard
enable docvalues for source_id field
draft PR to test the effect of enabling docvalues
for the source_id
field.
this is motivated by the discussion in https://github.com/pelias/api/pull/1608
it is hoped that this field can be used as an additional sorting criteria in order to make the ordering of results with the same _score
value more deterministic, and therefore make testing more stable and predictable.
my concerns with this change are:
- greatly increasing the index size (on disk)
- increasing the memory consumption (at runtime once
docvalues
are loaded)
the source_id
values are almost entirely unique and non-sequential, so I'd expect to see poor compression.
although.. these concerns are hopefully unwarranted, we might want to check those before merging this.
Snapshot size comparison:
