elasticsearch
elasticsearch copied to clipboard
Bulk merge field-caps responses using mapping hash
The most common usage of field-caps is retrieving the field-caps of group indices having the same index mappings. We can speed up the merging process by performing bulk merges for index responses with the same mapping hash.
This change reduces the response time by 10 times in the many_shards benchmark.
-
GET /auditbeat*/_field_caps?fields=*
(single index mapping)
| 50th percentile latency | field-caps | 4420.91 | 374.729 | -4046.19 | ms | -91.52% |
| 90th percentile latency | field-caps | 5126.87 | 402.883 | -4723.98 | ms | -92.14% |
| 99th percentile latency | field-caps | 5529.41 | 576.324 | -4953.08 | ms | -89.58% |
| 100th percentile latency | field-caps | 6096.73 | 643.252 | -5453.48 | ms | -89.45% |
-
GET /*/_field_caps?fields=*
* (i.e. multiple index mappings)
| 50th percentile latency | field-caps-all | 4475.04 | 395.844 | -4079.2 | ms | -91.15% |
| 90th percentile latency | field-caps-all | 5334.01 | 425.248 | -4908.76 | ms | -92.03% |
| 99th percentile latency | field-caps-all | 5628.16 | 606.959 | -5021.2 | ms | -89.22% |
| 100th percentile latency | field-caps-all | 6292.63 | 675.807 | -5616.82 | ms | -89.26% |
Pinging @elastic/es-search (Team:Search)
Hi @dnhatn, I've created a changelog YAML for you.
I'm not sure about this -- even if all indices use ECS I think there are often differences (some index has one field, another is missing it, etc.)? I'm wondering if it's worth generalizing this approach to still work when there are multiple mappings?
I didn't look into how to make this optimization for multiple mappings. I think it will take more time to make multiple mappings as efficient as single mapping.
Also how does this optimization interact with https://github.com/elastic/elasticsearch/pull/84598, I guess it won't be as helpful if we only send one request per mapping hash?
They are unrelated. #84598 sends a single request, but responds with multiple responses that share the underlying map. The merge process will do the same load (without optimizations).
@jtibshirani Thank you for reviews. I prefer to have this optimization in first, then working on a general optimization. We will remove this optimization (pretty small) entirely if the general optimization yields a similar performance. WDYT?
@jtibshirani I've pushed a general optimization for the merging process. The new optimization reduces the response time significantly (9-10 times) for field-caps requests targeting single or multiple index mappings. Can you take another look? Thank you!
@jtibshirani @javanna I've revised this PR. Can you please take another look?
This would also avoid the need to rely on map object equality as we do in the current strategy, which feels a little tricky/ fragile.
I added a comment explaining why we use object equality instead of comparing mapping hashes.
Is the comparison using getIndexMappingHash
much slower than the object equality comparison? I agree with @jtibshirani, this feels a bit fragile as it relies on the deserialization code elsewhere sharing objects, and implementation detail which may change in the future.
Is the comparison using getIndexMappingHash much slower than the object equality comparison? I agree with @jtibshirani, this feels a bit fragile as it relies on the deserialization code elsewhere sharing objects, and implementation detail which may change in the future.
Thanks for looking into this @romseygeek. I've pushed https://github.com/elastic/elasticsearch/pull/86323/commits/51eb9999dacf75f3ee60f1628231cfbf0bd7c46d to remove object equality comparison.
I added realease highlight
because 10x improvement is really impressive!! @dnhatn, you should brag about it.
@jtibshirani @romseygeek @Leaf-Lin thank you for reviews.