fluentd
fluentd copied to clipboard
Behaviour of `IO#set_encoding` has been changed as of Ruby 3.3
Describe the bug
2 tests of child_process helper always fail in Ruby 3.3.0-dev after https://redmine.ruby-lang.org/issues/18899 is landed.
===============================================================================================================================================================
Failure: test: can scrub characters without exceptions(ChildProcessTest)
/home/aho/Projects/Fluentd/fluentd/test/plugin_helper/test_child_process.rb:531:in `block (2 levels) in <class:ChildProcessTest>'
528: end
529: sleep TEST_WAIT_INTERVAL_FOR_BLOCK_RUNNING until m.locked? || ran
530: m.lock
=> 531: assert_equal Encoding.find('utf-8'), str.encoding
532: expected = "\xEF\xBF\xBD\xEF\xBF\xBD\x00\xEF\xBF\xBD\xEF\xBF\xBD".force_encoding("utf-8")
533: assert_equal expected, str
534: @d.stop; @d.shutdown; @d.close; @d.terminate
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:189:in `block in timeout'
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:36:in `block in catch'
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:36:in `catch'
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:36:in `catch'
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:198:in `timeout'
/home/aho/Projects/Fluentd/fluentd/test/plugin_helper/test_child_process.rb:520:in `block in <class:ChildProcessTest>'
<#<Encoding:UTF-8>> expected but was
<#<Encoding:ASCII-8BIT>>
diff:
? #<Encoding:UTF -8 >
? ASCII BIT
? ??? +++
===============================================================================================================================================================
F
===============================================================================================================================================================
Failure: test: can scrub characters without exceptions and replace specified chars(ChildProcessTest)
/home/aho/Projects/Fluentd/fluentd/test/plugin_helper/test_child_process.rb:552:in `block (2 levels) in <class:ChildProcessTest>'
549: end
550: sleep TEST_WAIT_INTERVAL_FOR_BLOCK_RUNNING until m.locked? || ran
551: m.lock
=> 552: assert_equal Encoding.find('utf-8'), str.encoding
553: expected = "??\x00??".force_encoding("utf-8")
554: assert_equal expected, str
555: @d.stop; @d.shutdown; @d.close; @d.terminate
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:189:in `block in timeout'
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:36:in `block in catch'
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:36:in `catch'
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:36:in `catch'
/home/aho/.rbenv/versions/3.3.0-dev/lib/ruby/3.3.0+0/timeout.rb:198:in `timeout'
/home/aho/Projects/Fluentd/fluentd/test/plugin_helper/test_child_process.rb:541:in `block in <class:ChildProcessTest>'
<#<Encoding:UTF-8>> expected but was
<#<Encoding:ASCII-8BIT>>
diff:
? #<Encoding:UTF -8 >
? ASCII BIT
? ??? +++
===============================================================================================================================================================
To Reproduce
$ rbenv install 3.3.0-dev
$ rbenv local 3.3.0-dev
$ bundle exec rake test TEST=test/plugin_helper/test_child_process.rb
Expected behavior
These tests shouldn't fail.
Your Environment
- Fluentd version: 8bb38b5c5cb13d2d9446f94708940f83f56a4b6f
- TD Agent version: N/A
- Operating system: Ubuntu 22.04, Windows 10
- Kernel version: 5.15.0-60-generic
Your Configuration
N/A
Your Error Log
Described in `Describe the bug` section.
Additional context
No response
Recently, it can cause a timeout.
- https://github.com/fluent/fluentd/actions/runs/5108944850/jobs/9183231391?pr=4188
After reading following articles, I've got the reason why these tests are failed on Ruby 3.3:
- https://redmine.ruby-lang.org/issues/18899
- https://github.com/ruby/ruby/pull/6280
IO#set_encoding
ignores the second argument (int_enc
) when ASCII-8bit
is specified as ext_enc
as of Ruby 3.3.
I'm not sure but I'm now doubting reasonability of this Ruby's change because the documentation about Encoding Options
implies possibility of transcoding ASCII-8BIT
-> UTF-8
with invalid: :replace
or undef: :replace
.
I'll open a new issue at https://redmine.ruby-lang.org/issues then ask about it.
I'll describe the detail of this issue at there.
On the other hand, I'm now doubting the specification of Fluent::PluginHelper::ChildProcess#child_process_execute
too.
Although Fluentd's default internal & external encoding are ASCII-8BIT
(workers are executed by ruby -Eascii-8bit:ascii-8bit
), Fluent::PluginHelper::ChildProcess#child_process_execute
uses UTF-8
as default internal encoding. They seem inconsistent. It would be better that Fluentd doesn't modify incoming data implicitly by default. How to modify data should be determined by each plugins & users.
So I'm not sure how to resolve this issue for now. I'm considering to mark these tests as pending until these problems are solved.
Thanks!
I'm considering to mark these tests as pending until these problems are solved.
I agree.
After further investigation, I found that probably it won't work as expected even on Ruby 3.2 or former.
test-set-conding.rb:
#!/usr/bin/env ruby
STDIN.set_encoding('ascii-8bit', 'utf-8', invalid: :replace, undef: :replace)
str = STDIN.read
p str.encoding
p str
p str.force_encoding("ascii-8bit")
Results with the input string foo\xFFbar
are following:
Ruby 3.2:
$ ruby -e 'STDOUT.write("foo\xFFbar")' | ruby test-set-conding.rb
#<Encoding:UTF-8>
"foo�bar"
"foo\xEF\xBF\xBDbar"
"foo�bar"
Ruby 3.3
$ ruby -e 'STDOUT.write("foo\xFFbar")' | ruby test-set-conding.rb
#<Encoding:ASCII-8BIT>
"foo\xFFbar"
"foo\xFFbar"
"foo\xFFbar"
Looking at this result alone, it appears to be working as expected on Ruby 3.2.
But when I test it with the input string あ\xFFい
, the result seems questionable:
Ruby 3.2
$ ruby -e 'STDOUT.write("あ\xFFい")' | ruby test-set-conding.rb
#<Encoding:UTF-8>
"�������"
"\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD"
"�������"
Ruby 3.3
$ ruby -e 'STDOUT.write("あ\xFFい")' | ruby test-set-conding.rb
#<Encoding:ASCII-8BIT>
"\xE3\x81\x82\xFF\xE3\x81\x84"
"\xE3\x81\x82\xFF\xE3\x81\x84"
"あ\xFFい"
Probably the current Fluentd's test expects あ�い
for this byte sequence、but it replaces all bytes to �
even if the input string includes valid UTF-8 sequence in actual.
related issue: #4460
Now I think Ruby 3.3's change seems better than before.
But when I test it with the input string
あ\xFFい
, the result seems questionable:
Probably the current Fluentd's test expects
あ�い
for this byte sequence、but it replaces all bytes to�
even if the input string includes valid UTF-8 sequence in actual.
This result on Ruby 3.2 isn't questionable, because :invalid
and :undef
are for internal_encoding
, so that all non-ASCII characters should be replaced. But such conversion doesn't seem make sense in most cases for ASCII-8Bit (it means binary). It would be better not to modifier input data.
Now I think Ruby 3.3's change seems better than before.
But when I test it with the input string
あ\xFFい
, the result seems questionable:Probably the current Fluentd's test expects
あ�い
for this byte sequence、but it replaces all bytes to�
even if the input string includes valid UTF-8 sequence in actual.This result on Ruby 3.2 isn't questionable, because
:invalid
and:undef
are forinternal_encoding
, so that all non-ASCII characters should be replaced. But such conversion doesn't seem make sense in most cases for ASCII-8Bit (it means binary). It would be better not to modifier input data.
I see! Yes! I think so too!
- internal_encoding
+ external_encoding