seven_zip_ruby
seven_zip_ruby copied to clipboard
Encoding::UndefinedConversionError on non UTF-8 filename characters
seven_zip_ruby 1.3.0 on openSUSE-15.4 (Linux)
https://download.opensuse.org/distribution/leap/15.4/live/openSUSE-Leap-15.4-KDE-Live-x86_64-Media.iso
sudo zypper in ruby2.5-devel gcc-c++
sudo gem install --conservative --no-doc seven_zip_ruby
The problem:
mkdir in
touch in/non_"$(echo -ne '\x80')"_utf8.txt
ruby -e 'require "seven_zip_ruby";
File.open("z.7z", "wb") { |file| SevenZipRuby::Writer.add_directory(file, "in".force_encoding(Encoding::ASCII_8BIT)) }'
result:
Traceback (most recent call last):
11: from -e:1:in `<main>'
10: from -e:1:in `open'
9: from -e:1:in `block in <main>'
8: from /usr/lib64/ruby/gems/2.5.0/gems/seven_zip_ruby-1.3.0/lib/seven_zip_ruby/seven_zip_writer.rb:187:in `add_directory'
7: from /usr/lib64/ruby/gems/2.5.0/gems/seven_zip_ruby-1.3.0/lib/seven_zip_ruby/seven_zip_writer.rb:118:in `open'
6: from /usr/lib64/ruby/gems/2.5.0/gems/seven_zip_ruby-1.3.0/lib/seven_zip_ruby/seven_zip_writer.rb:188:in `block in add_directory'
5: from /usr/lib64/ruby/gems/2.5.0/gems/seven_zip_ruby-1.3.0/lib/seven_zip_ruby/seven_zip_writer.rb:403:in `add_directory'
4: from /usr/lib64/ruby/gems/2.5.0/gems/seven_zip_ruby-1.3.0/lib/seven_zip_ruby/seven_zip_writer.rb:403:in `glob'
3: from /usr/lib64/ruby/gems/2.5.0/gems/seven_zip_ruby-1.3.0/lib/seven_zip_ruby/seven_zip_writer.rb:403:in `glob'
2: from /usr/lib64/ruby/gems/2.5.0/gems/seven_zip_ruby-1.3.0/lib/seven_zip_ruby/seven_zip_writer.rb:410:in `block in add_directory'
1: from /usr/lib64/ruby/gems/2.5.0/gems/seven_zip_ruby-1.3.0/lib/seven_zip_ruby/seven_zip_writer.rb:340:in `add_file'
/usr/lib64/ruby/gems/2.5.0/gems/seven_zip_ruby-1.3.0/lib/seven_zip_ruby/seven_zip_writer.rb:340:in `encode': "\x80" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
Without Encoding::ASCII_8BIT it results in a invalid byte sequence in UTF-8 (ArgumentError).
And with everything I know about Ruby, filename strings should always have Encoding::ASCII_8BIT. So setting Encoding::ASCII_8BIT is definitively the right thing here.