fluentd
fluentd copied to clipboard
in_exec: Can't handle non-ASCII characters output
Describe the bug
in_exec
can not handle non-ASCII characters output.
It is because of the specification of child_process_execute
:
-
external_encoding
:ascii-8bit
-
internal_encoding
:utf-8
-
encoding_options
:invalid: :replace, undef: :replace
This always breaks none non-ASCII characters.
https://github.com/fluent/fluentd/blob/1a2759c31efc6bc31f25084c27115722bf5965aa/lib/fluent/plugin_helper/child_process.rb#L65-L72
https://github.com/fluent/fluentd/blob/1a2759c31efc6bc31f25084c27115722bf5965aa/lib/fluent/plugin_helper/child_process.rb#L247-L253
https://github.com/fluent/fluentd/blob/1a2759c31efc6bc31f25084c27115722bf5965aa/lib/fluent/plugin_helper/child_process.rb#L279
We can easily confirm the io behavior by irb:
irb(main):001:0> require "open3"
=> true
irb(main):002:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 6>, #<IO:fd 7>, #<Process::Waiter:0x00007f7d942fea40 run>]
irb(main):003:0> r_io.read
=> "こんにちは\n"
irb(main):004:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 8>, #<IO:fd 9>, #<Process::Waiter:0x00007f7d942d45b0 run>]
irb(main):005:0> r_io.set_encoding(Encoding::ASCII_8BIT, Encoding::UTF_8, invalid: :replace, undef: :replace)
=> #<IO:fd 9>
irb(main):006:0> r_io.read
=> "���������������\n"
irb(main):007:0>
I'm wondering if we should fix the implementation of in_exec
as follows:
diff --git a/lib/fluent/plugin/in_exec.rb b/lib/fluent/plugin/in_exec.rb
index c2851366..ab514957 100644
--- a/lib/fluent/plugin/in_exec.rb
+++ b/lib/fluent/plugin/in_exec.rb
@@ -74,9 +74,9 @@ module Fluent::Plugin
super
if @run_interval
- child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], &method(:run))
+ child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], internal_encoding: nil, &method(:run))
else
- child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], &method(:run))
+ child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], internal_encoding: nil, &method(:run))
end
end
By specifying internal_encoding: nil
, we can stop the automatic encoding conversion in child_process_execute
.
This allows in_exec
to handle non-ASCII characters.
Does the current automatic encoding conversion make any sense?
One possible cause could be that the encoding of the data must be utf-8
.
Even if so, I believe it would be wrong to always convert the actual encode to utf-8
of the result of the command in in_exec
.
To Reproduce
Run the following sample config.
Expected behavior
in_exec
can handle non-ASCII characters output as well.
Your Environment
- Fluentd version: 1.16.5
- Operating system: Ubuntu 20.04.6 LTS, Windows 10
- Kernel version: 5.15.0-101-generic
Your Configuration
<source>
@type exec
command "echo こんにちは"
tag test
<parse>
@type none
</parse>
</source>
<match test>
@type stdout
</match>
Your Error Log
(No error, but I put the stdout output here.)
2024-04-03 16:51:59 +0900 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: parsing config file is succeeded path="/test/fluentd/config/in_exec/1.conf"
2024-04-03 16:51:59 +0900 [info]: gem 'fluentd' version '1.16.5'
2024-04-03 16:51:59 +0900 [info]: using configuration file: <ROOT>
<source>
@type exec
command "echo こんにちは"
tag "test"
<parse>
@type "none"
</parse>
</source>
<match test>
@type stdout
</match>
</ROOT>
2024-04-03 16:51:59 +0900 [info]: starting fluentd-1.16.5 pid=439655 ruby="3.2.2"
2024-04-03 16:51:59 +0900 [info]: spawn command to main: cmdline=["/home/daipom/.rbenv/versions/3.2.2/bin/ruby", "-r/home/daipom/.rbenv/versions/3.2.2/lib/ruby/site_ruby/3.2.0/bundler/setup", "-Eascii-8bit:ascii-8bit", "/home/daipom/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/bin/fluentd", "-c", "/test/fluentd/config/in_exec/1.conf", "--under-supervisor"]
2024-04-03 16:51:59 +0900 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: adding match pattern="test" type="stdout"
2024-04-03 16:51:59 +0900 [info]: adding source type="exec"
2024-04-03 16:51:59 +0900 [info]: #0 starting fluentd worker pid=439675 ppid=439655 worker=0
2024-04-03 16:51:59 +0900 [info]: #0 fluentd worker is now running worker=0
2024-04-03 16:51:59.808444702 +0900 test: {"message":"���������������"}
Additional context
No response