logstash icon indicating copy to clipboard operation
logstash copied to clipboard

Backport PR #16482 to 8.16: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit

Open github-actions[bot] opened this issue 4 months ago • 2 comments

Backport PR #16482 to 8.16 branch, original message:


Release notes

[rn:skip]

What does this PR do?

Updates BufferedTokenizerExt so that can accumulate token fragments coming from different data segments. When a "buffer full" condition is matched, it record this state in a local field so that on next data segment it can consume all the token fragments till the next token delimiter. Updated the accumulation variable from RubyArray containing strings to a StringBuilder which contains the head token, plus the remaining token fragments are stored in the input array. Port the tests present at https://github.com/elastic/logstash/blob/f35e10d79251b4ce3a5a0aa0fbb43c2e96205ba1/logstash-core/spec/logstash/util/buftok_spec.rb#L20 in Java.

Why is it important/What is the impact to the user?

Fixes the behaviour of the tokenizer to be able to work properly when buffer full conditions are met.

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I have commented my code, particularly in hard-to-understand areas
  • ~~[ ] I have made corresponding changes to the documentation~~
  • ~~[ ] I have made corresponding change to the default configuration files (and/or docker env variables)~~
  • [x] I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [x] test as described in #16483

How to test this PR locally

Follow the instructions in #16483

Related issues

  • Closes #16483

Use cases

Screenshots

Logs

github-actions[bot] avatar Oct 17 '24 11:10 github-actions[bot]