OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Parsing input formats beyond JSON...

Open itiyama opened this issue 2 years ago • 1 comments

JSON is, by far, the most popular input format for Opensearch. JSON is human readable, so it is pretty easy to test and develop with. Additionally, the Opensearch ecosystem is built around JSON with most benchmarks written in JSON and ingest connectors supporting JSON.

JSON is much slower and more space consuming than most binary formats. In a high throughput ingestion pipeline, most customers may not really care about the x content type during ingestion as long as it is high performant and feature complete. All they need is a client that can convert documents into ES bulk API requests and send those off to the ES cluster.

We had run a benchmark (details attached) in the past with CSV as x content type for nyc_taxis and found that indexing throughput increased by 20%.

Possible improvements?

  1. Make x-content types pluggable so that it is easier to add and test more formats easily. We will also create the OSB tools to easily test and compare various formats.
  2. We will introduce and test out more performant x-content types in Opensearch like protobuf/Amazon ION.
  3. Contribute highly performant parsing protocols in open source connectors like Logstash or data prepper to help adoption of the usage of new performant protocols higher
  4. CBOR is not currently supported for bulk. We will figure out how we can optimize the bulk format to enable CBOR support in particular and other formats in general.

Trade-offs?

  1. Support for more content types means that we need to maintain more code. Adding extensibility in that layer would help alleviate this issue to some extent.
  2. Would these optimizations be helpful after we implement streaming API. We could come up with our own custom wire protocol for streaming, but an optimized parser for documents would still be helpful?

Experiment details: We used CSV instead of a binary format like CBOR because the CSV data from lucene util benchmarks is readily available. Additionally, bulk API is not supported for CBOR and the main purpose of my experiment was to compare indexing throughput with a different format. One of the other things to note is that the CSV format that I used had a predetermined mapping in terms of ordering and hence the parsing was faster as a result since we do not have to parse the field names multiple times. [Note: The entire CSV experiment run was done in collaboration with Swarnim-singhal from Opensearch team.]

Metric | Unit | json_run_mean | csv_run_mean | Diff
-- | -- | -- | -- | --
Cumulative indexing time of primary shards | min | 62.09304 | 52.62159 | 15.25%
Min cumulative indexing time across primary shards | min | 0 | 0 |  
Median cumulative indexing time across primary shards | min | 20.30717 | 17.38337 | 14.40%
Max cumulative indexing time across primary shards | min | 21.47871 | 17.85485 | 16.87%
Cumulative indexing throttle time of primary shards | min | 0 | 0 |  
Min cumulative indexing throttle time across primary shards | min | 0 | 0 |  
Median cumulative indexing throttle time across primary shards | min | 0 | 0 |  
Max cumulative indexing throttle time across primary shards | min | 0 | 0 |  
Cumulative merge time of primary shards | min | 10.73192 | 8.80159 | 17.99%
Cumulative merge count of primary shards |   | 31.4 | 22 | 29.94%
Min cumulative merge time across primary shards | min | 0 | 0 |  
Median cumulative merge time across primary shards | min | 3.43232 | 2.84041 | 17.25%
Max cumulative merge time across primary shards | min | 3.86728 | 3.12078 | 19.30%
Cumulative merge throttle time of primary shards | min | 4.88534 | 3.39694 | 30.47%
Min cumulative merge throttle time across primary shards | min | 0 | 0 |  
Median cumulative merge throttle time across primary shards | min | 1.5158 | 1.07487 | 29.09%
Max cumulative merge throttle time across primary shards | min | 1.85375 | 1.2472 | 32.72%
Cumulative refresh time of primary shards | min | 1.64562 | 1.86223 | -13.16%
Cumulative refresh count of primary shards |   | 119.2 | 95.2 | 20.13%
Min cumulative refresh time across primary shards | min | 1.67E-05 | 1.67E-05 | 0%
Median cumulative refresh time across primary shards | min | 0.52874 | 0.6034 | -14.12%
Max cumulative refresh time across primary shards | min | 0.58812 | 0.65541 | -11.44%
Cumulative flush time of primary shards | min | 0.08726 | 1.67E-05 | 99.98%
Cumulative flush count of primary shards |   | 19.4 | 12 | 38.14%
Min cumulative flush time across primary shards | min | 1.33E-05 | 0 | 100%
Median cumulative flush time across primary shards | min | 0.02575 | 0 | 100%
Max cumulative flush time across primary shards | min | 0.03574 | 1.67E-05 | 99.95%
Total Young Gen GC | s | 126.5386 | 97.2458 | 23.15%
Total Old Gen GC | s | 0.2732 | 0.2252 | 17.57%
Store size | GB | 6.2157 | 5.80745 | 6.57%
Translog size | GB | 2.56E-07 | 2.56E-07 | 0%
Heap used for segments | MB | 0.30214 | 0.26277 | 13.03%
Heap used for doc values | MB | 0.1131 | 0.09636 | 14.80%
Heap used for terms | MB | 0.11721 | 0.11088 | 5.40%
Heap used for norms | MB | 6.10E-05 | 6.10E-05 | 0%
Heap used for points | MB | 0 | 0 |  
Heap used for stored fields | MB | 0.07177 | 0.05547 | 22.72%
Segment count |   | 103.8 | 98.2 | 5.40%
50th percentile latency | ms | 883.48106 | 751.40026 | 14.95%
90th percentile latency | ms | 1274.41085 | 1031.75011 | 19.04%
99th percentile latency | ms | 2801.06333 | 3005.06845 | -7.28%
99.9th percentile latency | ms | 3293.21416 | 3477.01062 | -5.58%
100th percentile latency | ms | 3727.72659 | 3720.35703 | 0.20%
50th percentile service time | ms | 883.48106 | 751.40026 | 14.95%
90th percentile service time | ms | 1274.41085 | 1031.75011 | 19.04%
99th percentile service time | ms | 2801.06333 | 3005.06845 | -7.28%
99.9th percentile service time | ms | 3293.21416 | 3477.01062 | -5.58%
100th percentile service time | ms | 3727.72659 | 3720.35703 | 0.20%
error rate | % | 0 | 0 |  
  |   |   |   |  
Min Throughput | docs/s | 80651.35 | 94726.228 | 17.45%
Median Throughput | docs/s | 81680.232 | 95967.248 | 17.49%
Max Throughput | docs/s | 83857.414 | 98087.79 | 16.97%

itiyama avatar Sep 20 '22 03:09 itiyama

I like the idea of extensibility in this space. I feel like I'd want to write a super optimized parser (e.g. with native code, or rust), even for JSON, and swap the default one easily.

dblock avatar Sep 20 '22 20:09 dblock