logstash icon indicating copy to clipboard operation
logstash copied to clipboard

Fix/avoid oom accumulation in bufftok

Open andsel opened this issue 8 months ago • 3 comments

TODO to be applied on top of #17229

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

  • [ ] My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • [ ] I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

1 Test on stdin

Test running the same configuration as in https://github.com/logstash-plugins/logstash-codec-json_lines/pull/43 against this branch, it shouldn't go in OOM.

2 Test on TCP

Use the 1Gb big one line file generated at step 1, with same heap limitation. Run Logstash with following pipeline:

input {
  tcp {
    port => 1234

    codec => json_lines {
      decode_size_limit_bytes => 32768
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

and feed data in with netcat, like in:

cat /path/to/big_single_line.json | netcat localhost 1234

3 Test with file input

Use the 1Gb big one line file generated at step 1, with same heap limitation. Run Logstash with following pipeline:

input {
  file {
    path => "/path/to/big_single_line.json"
    sincedb_path => "/tmp/sincedb"
    mode => "read"
    file_completed_action => "log"
    file_completed_log_path => "/tmp/processed.log"

    codec => json_lines {
      decode_size_limit_bytes => 32768
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Related issues

  • Closes #17275

Logs

Before this PR, but with branch in #17229

[2025-03-11T14:57:28,792][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}




java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18156.hprof ...
Heap dump file created [450806605 bytes in 0.253 secs]
[2025-03-11T14:59:11,919][FATAL][org.logstash.Logstash    ][main][59381166257a19a15b5377e6f323f1a71da7efdffecf2017efc6813660a5656e] uncaught error (in thread [main]<stdin)
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:3541) ~[?:?]
	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:242) ~[?:?]
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:587) ~[?:?]
	at java.lang.StringBuilder.append(StringBuilder.java:179) ~[?:?]
	at org.logstash.common.BufferedTokenizer$DataSplitter.append(BufferedTokenizer.java:102) ~[logstash-core.jar:?]
	at org.logstash.common.BufferedTokenizer.extract(BufferedTokenizer.java:135) ~[logstash-core.jar:?]
	at org.logstash.common.BufferedTokenizerExt.extract(BufferedTokenizerExt.java:83) ~[logstash-core.jar:?]
	at java.lang.invoke.LambdaForm$DMH/0x00000008007dd400.invokeVirtual(LambdaForm$DMH) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007f5000.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007e4800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007e4800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder) ~[?:?]
	at Users.andrea.workspace.logstash_plugins.logstash_minus_codec_minus_json_lines.lib.logstash.codecs.json_lines.RUBY$method$decode$0(/Users/andrea/workspace/logstash_plugins/logstash-codec-json_lines/lib/logstash/codecs/json_lines.rb:69) ~[?:?]
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x0000000800bda800.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x000000080084b400.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007e8000.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007e8000.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder) ~[?:?]
	at Users.andrea.workspace.logstash_andsel.vendor.bundle.jruby.$3_dot_1_dot_0.gems.logstash_minus_input_minus_stdin_minus_3_dot_4_dot_0.lib.logstash.inputs.stdin.RUBY$method$process$0(/Users/andrea/workspace/logstash_andsel/vendor/bundle/jruby/3.1.0/gems/logstash-input-stdin-3.4.0/lib/logstash/inputs/stdin.rb:61) ~[?:?]
	at java.lang.invoke.LambdaForm$DMH/0x0000000800fc1000.invokeStatic(LambdaForm$DMH) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008008dd000.invokeExact_MT(LambdaForm$MH) ~[?:?]
	at org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:178) ~[jruby.jar:?]
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:222) ~[jruby.jar:?]
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:228) ~[jruby.jar:?]
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:291) ~[jruby.jar:?]
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:324) ~[jruby.jar:?]
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:66) ~[jruby.jar:?]

With this fix:

[2025-03-11T15:05:22,081][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
{
       "message" => "Payload bigger than 32768 bytes",
    "@timestamp" => 2025-03-11T14:05:22.944113Z,
          "host" => {
        "hostname" => "andreas-MBP-2.station"
    },
         "event" => {
        "original" => "orn\",\"surname\":\"Borg\"},{\"name\":\"John\",\"surname\":\"McEnroe\"},{\"name\":\"Jimmy\",\"surname\":\"Connors\"}],\"field_2621593\": [{\"name\":\"Jannik\",\"surname\":\"Sinner\"},{\"name\":\"Novak\",\"surname\":\"Djokovic\"},{\"name\":\"Rafa\",\"surname\":\"Nadal\"},{\"name\":\"Roger\",\"surname\":\"Federer\"},{\"name\":\"Pete\",\"surname\":\"Sampras\"},{\"name\":\"Andr\xC3\xA9\",\"surname\":\"Agassi\"},{\"name\":\"Rod\",\"surname\":\"Laver\"},{\"name\":\"Ivan\",\"surname\":\"Lendl\"},{\"name\":\"Bjorn\",\"surname\":\"Borg\"},{\"name\":\"John\",\"surname\":\"McEnroe\"},{\"name\":\"Jimmy\",\"surname\":\"Connors\"}]}\r\n"
    },
          "tags" => [
        [0] "_jsonparsetoobigfailure"
    ],
      "@version" => "1"
}
[2025-03-11T15:05:23,115][INFO ][logstash.javapipeline    ][main] Pipeline terminated {"pipeline.id"=>"main"}
[2025-03-11T15:05:23,599][INFO ][logstash.pipelinesregistry] Removed pipeline from registry successfully {:pipeline_id=>:main}
[2025-03-11T15:05:23,609][INFO ][logstash.runner          ] Logstash shut down.

andsel avatar Mar 10 '25 11:03 andsel

This pull request does not have a backport label. Could you fix it @andsel? 🙏 To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-8.x is the label to automatically backport to the 8.x branch.
  • If no backport is necessary, please add the backport-skip label

mergify[bot] avatar Mar 10 '25 11:03 mergify[bot]

:yellow_heart: Build succeeded, but was flaky

Failed CI Steps

History

  • :green_heart: Build #2559 succeeded e23c26327db7d42795faed6d72c7456c1532b2ed
  • :broken_heart: Build #2558 failed fc3f3e46cb71037c83943e4a1a9c574e04c5241d
  • :broken_heart: Build #2557 failed eea9f37ee25a76e2cc89634cc48a7bae6a04ab7d

cc @andsel

elasticmachine avatar Mar 12 '25 15:03 elasticmachine

Closed because squashed into the parent PR #17229

andsel avatar Jun 18 '25 14:06 andsel