logstash
logstash copied to clipboard
Backport PR #16313 to 8.14: Json normalization performance
Backport PR #16313 to 8.14 branch, original message:
Release notes
- Restores performance of JSON-encoding operation by deferring an encoding operation and reducing unnecessary memory copying
What does this PR do?
- Eliminates wasteful computation when encoding JSON
Why is it important/What is the impact to the user?
The safety added by 8.14.1's unicode normalization came at the cost of requiring all deeply-nested strings to be encoded to utf-8 prior to being passed to jrjackson, but separate upstream fixes in jrjackson eliminated this need for most cases.
This PR restores some of the lost performance while keeping identical behaviour. We can chase down the remaining performance gap (which may require a behavior change) in a separate effort.
Checklist
- [x] My code follows the style guidelines of this project
- [x] I have commented my code, particularly in hard-to-understand areas
- ~~[ ] I have made corresponding changes to the documentation~~
- ~~[ ] I have made corresponding change to the default configuration files (and/or docker env variables)~~
- ~~[ ] I have added tests that prove my fix is effective or that my feature works~~
How to test this PR locally
-
export an environment variable containing deeply-nested JSON structure
export MESSAGE='{"one":{"two":{"three":{"four":{"five":{"six":{"seven":{"eight":{"nine":{"ten":{"eleven":{"twelve":{"thirteen":{"fourteen":{"fifteen":{}}}}}}}}}}}}}}}}' -
use that environment variable as input to a simple logstash pipeline with generator and json codec that invokes
LogStash::Json.dump:bin/logstash -e 'input { generator { threads => 4 message => "${MESSAGE}" codec => json } } filter { ruby { code => "::LogStash::Json.dump(event.to_hash)" } } output { sink { } }' -
observe throughput:
watch --color '(curl --silent -XGET http://localhost:9600/_node/stats | jq --color-output "pick(.process, .flow)")'On my M3 laptop, this translated to:
scenario valid utf-8 preserved utf-8 throughput cpu no normalization (8.13.x) ⚠️ ✅ 440k eps 48% pre-normalization (8.14.2) ✅ ✅ 270k eps 62% jackson normalization (8.14.2) ✅ ⚠️ 440k eps 49% filtered pre-normalization (POC) ✅ ✅ 320k eps 57% reduced copy pre-normalization (this PR) ✅ ✅ 360k eps 55% valid utf-8:- ✅ : output is always valid
UTF-8, a pre-requisite for being valid JSON - ⚠️ : when inputs are non-
UTF-8or invalid-UTF-8, output can be corrupt
- ✅ : output is always valid
preserved utf-8:- ✅ : valid
UTF-8sequences in binary-flagged strings are preserved - ⚠️ : each byte in a valid multibyte
UTF-8sequences in binary-flagged strings is replaced with the unicode replacement character (lossy)
- ✅ : valid
I've converted this 8.14 backport to a draft so that we can be intentional about merging it.
- If a BC2 for 8.14.3 is planned: then this should be merged and DRA staging should be kicked off to include it
- Else if 8.14.4 is planned: then this PR should be merged after 8.14.3 is tagged
- Else this PR should be closed unmerged.
Quality Gate passed
Issues
0 New issues
0 Fixed issues
0 Accepted issues
Measures
0 Security Hotspots
No data about Coverage
No data about Duplication