elasticsearch Bulk merge field-caps responses using mapping hash

The most common usage of field-caps is retrieving the field-caps of group indices having the same index mappings. We can speed up the merging process by performing bulk merges for index responses with the same mapping hash.

This change reduces the response time by 10 times in the many_shards benchmark.

GET /auditbeat*/_field_caps?fields=* (single index mapping)

|  50th percentile latency |     field-caps |  4420.91  |  374.729  |   -4046.19  | ms |  -91.52% |
|  90th percentile latency |     field-caps |  5126.87  |  402.883  |   -4723.98  | ms |  -92.14% |
|  99th percentile latency |     field-caps |  5529.41  |  576.324  |   -4953.08  | ms |  -89.58% |
| 100th percentile latency |     field-caps |  6096.73  |  643.252  |   -5453.48  | ms |  -89.45% |

GET /*/_field_caps?fields=* * (i.e. multiple index mappings)

|  50th percentile latency | field-caps-all |  4475.04  |  395.844  |   -4079.2   | ms |  -91.15% |
|  90th percentile latency | field-caps-all |  5334.01  |  425.248  |   -4908.76  | ms |  -92.03% |
|  99th percentile latency | field-caps-all |  5628.16  |  606.959  |   -5021.2   | ms |  -89.22% |
| 100th percentile latency | field-caps-all |  6292.63  |  675.807  |   -5616.82  | ms |  -89.26% |

May 01 '22 13:05 dnhatn

Pinging @elastic/es-search (Team:Search)

May 01 '22 14:05 elasticmachine

Hi @dnhatn, I've created a changelog YAML for you.

May 01 '22 14:05 elasticsearchmachine

I'm not sure about this -- even if all indices use ECS I think there are often differences (some index has one field, another is missing it, etc.)? I'm wondering if it's worth generalizing this approach to still work when there are multiple mappings?

I didn't look into how to make this optimization for multiple mappings. I think it will take more time to make multiple mappings as efficient as single mapping.

Also how does this optimization interact with https://github.com/elastic/elasticsearch/pull/84598, I guess it won't be as helpful if we only send one request per mapping hash?

They are unrelated. #84598 sends a single request, but responds with multiple responses that share the underlying map. The merge process will do the same load (without optimizations).

@jtibshirani Thank you for reviews. I prefer to have this optimization in first, then working on a general optimization. We will remove this optimization (pretty small) entirely if the general optimization yields a similar performance. WDYT?

May 02 '22 16:05 dnhatn

@jtibshirani I've pushed a general optimization for the merging process. The new optimization reduces the response time significantly (9-10 times) for field-caps requests targeting single or multiple index mappings. Can you take another look? Thank you!

May 08 '22 01:05 dnhatn

@jtibshirani @javanna I've revised this PR. Can you please take another look?

This would also avoid the need to rely on map object equality as we do in the current strategy, which feels a little tricky/ fragile.

I added a comment explaining why we use object equality instead of comparing mapping hashes.

Oct 15 '22 21:10 dnhatn

Is the comparison using getIndexMappingHash much slower than the object equality comparison? I agree with @jtibshirani, this feels a bit fragile as it relies on the deserialization code elsewhere sharing objects, and implementation detail which may change in the future.

Oct 31 '22 16:10 romseygeek

Is the comparison using getIndexMappingHash much slower than the object equality comparison? I agree with @jtibshirani, this feels a bit fragile as it relies on the deserialization code elsewhere sharing objects, and implementation detail which may change in the future.

Thanks for looking into this @romseygeek. I've pushed https://github.com/elastic/elasticsearch/pull/86323/commits/51eb9999dacf75f3ee60f1628231cfbf0bd7c46d to remove object equality comparison.

Nov 01 '22 00:11 dnhatn

I added realease highlight because 10x improvement is really impressive!! @dnhatn, you should brag about it.

Nov 01 '22 09:11 Leaf-Lin

@jtibshirani @romseygeek @Leaf-Lin thank you for reviews.

Nov 01 '22 16:11 dnhatn

elasticsearch elasticsearch copied to clipboard

Bulk merge field-caps responses using mapping hash

elasticsearch
elasticsearch copied to clipboard