fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

in_exec: Can't handle non-ASCII characters output

Open daipom opened this issue 2 months ago • 1 comments

Describe the bug

in_exec can not handle non-ASCII characters output.

It is because of the specification of child_process_execute:

  • external_encoding: ascii-8bit
  • internal_encoding: utf-8
  • encoding_options: invalid: :replace, undef: :replace

This always breaks none non-ASCII characters.

https://github.com/fluent/fluentd/blob/1a2759c31efc6bc31f25084c27115722bf5965aa/lib/fluent/plugin_helper/child_process.rb#L65-L72

https://github.com/fluent/fluentd/blob/1a2759c31efc6bc31f25084c27115722bf5965aa/lib/fluent/plugin_helper/child_process.rb#L247-L253

https://github.com/fluent/fluentd/blob/1a2759c31efc6bc31f25084c27115722bf5965aa/lib/fluent/plugin_helper/child_process.rb#L279

We can easily confirm the io behavior by irb:

irb(main):001:0> require "open3"
=> true
irb(main):002:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 6>, #<IO:fd 7>, #<Process::Waiter:0x00007f7d942fea40 run>]
irb(main):003:0> r_io.read
=> "こんにちは\n"
irb(main):004:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 8>, #<IO:fd 9>, #<Process::Waiter:0x00007f7d942d45b0 run>]
irb(main):005:0> r_io.set_encoding(Encoding::ASCII_8BIT, Encoding::UTF_8, invalid: :replace, undef: :replace)
=> #<IO:fd 9>
irb(main):006:0> r_io.read
=> "���������������\n"
irb(main):007:0> 

I'm wondering if we should fix the implementation of in_exec as follows:

diff --git a/lib/fluent/plugin/in_exec.rb b/lib/fluent/plugin/in_exec.rb
index c2851366..ab514957 100644
--- a/lib/fluent/plugin/in_exec.rb
+++ b/lib/fluent/plugin/in_exec.rb
@@ -74,9 +74,9 @@ module Fluent::Plugin
       super

       if @run_interval
-        child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], &method(:run))
+        child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], internal_encoding: nil, &method(:run))
       else
-        child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], &method(:run))
+        child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], internal_encoding: nil,  &method(:run))
       end
     end

By specifying internal_encoding: nil, we can stop the automatic encoding conversion in child_process_execute. This allows in_exec to handle non-ASCII characters.

Does the current automatic encoding conversion make any sense? One possible cause could be that the encoding of the data must be utf-8. Even if so, I believe it would be wrong to always convert the actual encode to utf-8 of the result of the command in in_exec.

To Reproduce

Run the following sample config.

Expected behavior

in_exec can handle non-ASCII characters output as well.

Your Environment

- Fluentd version: 1.16.5
- Operating system: Ubuntu 20.04.6 LTS, Windows 10
- Kernel version: 5.15.0-101-generic

Your Configuration

<source>
  @type exec
  command "echo こんにちは"
  tag test
  <parse>
    @type none
  </parse>
</source>

<match test>
  @type stdout
</match>

Your Error Log

(No error, but I put the stdout output here.)

2024-04-03 16:51:59 +0900 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: parsing config file is succeeded path="/test/fluentd/config/in_exec/1.conf"
2024-04-03 16:51:59 +0900 [info]: gem 'fluentd' version '1.16.5'
2024-04-03 16:51:59 +0900 [info]: using configuration file: <ROOT>
  <source>
    @type exec
    command "echo こんにちは"
    tag "test"
    <parse>
      @type "none"
    </parse>
  </source>
  <match test>
    @type stdout
  </match>
</ROOT>
2024-04-03 16:51:59 +0900 [info]: starting fluentd-1.16.5 pid=439655 ruby="3.2.2"
2024-04-03 16:51:59 +0900 [info]: spawn command to main:  cmdline=["/home/daipom/.rbenv/versions/3.2.2/bin/ruby", "-r/home/daipom/.rbenv/versions/3.2.2/lib/ruby/site_ruby/3.2.0/bundler/setup", "-Eascii-8bit:ascii-8bit", "/home/daipom/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/bin/fluentd", "-c", "/test/fluentd/config/in_exec/1.conf", "--under-supervisor"]
2024-04-03 16:51:59 +0900 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: adding match pattern="test" type="stdout"
2024-04-03 16:51:59 +0900 [info]: adding source type="exec"
2024-04-03 16:51:59 +0900 [info]: #0 starting fluentd worker pid=439675 ppid=439655 worker=0
2024-04-03 16:51:59 +0900 [info]: #0 fluentd worker is now running worker=0
2024-04-03 16:51:59.808444702 +0900 test: {"message":"���������������"}

Additional context

No response

daipom avatar Apr 03 '24 07:04 daipom