logstash
logstash copied to clipboard
Backport PR #16482 to 8.16: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit
Backport PR #16482 to 8.16 branch, original message:
Release notes
[rn:skip]
What does this PR do?
Updates BufferedTokenizerExt
so that can accumulate token fragments coming from different data segments. When a "buffer full" condition is matched, it record this state in a local field so that on next data segment it can consume all the token fragments till the next token delimiter.
Updated the accumulation variable from RubyArray
containing strings to a StringBuilder which contains the head token, plus the remaining token fragments are stored in the input
array.
Port the tests present at https://github.com/elastic/logstash/blob/f35e10d79251b4ce3a5a0aa0fbb43c2e96205ba1/logstash-core/spec/logstash/util/buftok_spec.rb#L20 in Java.
Why is it important/What is the impact to the user?
Fixes the behaviour of the tokenizer to be able to work properly when buffer full conditions are met.
Checklist
- [x] My code follows the style guidelines of this project
- [x] I have commented my code, particularly in hard-to-understand areas
- ~~[ ] I have made corresponding changes to the documentation~~
- ~~[ ] I have made corresponding change to the default configuration files (and/or docker env variables)~~
- [x] I have added tests that prove my fix is effective or that my feature works
Author's Checklist
- [x] test as described in #16483
How to test this PR locally
Follow the instructions in #16483
Related issues
- Closes #16483