rubyinstaller2
rubyinstaller2 copied to clipboard
Ruby require fails when the path has special characters
What problems are you experiencing?
If the path has special characters in it and you try to run a Ruby script that does a relative_require on that path, it fails to load the file. It's almost certainly something to do with encoding on the Windows console.
It failed for me with:
- Windows Terminal
- Windows cmd.exe with both
Active code page: 437
andActive code page: 65001
This ticket is based on an issue on rails at https://github.com/rails/rails/issues/29087
Steps to reproduce
Create a folder called Test Ø
and in it have 2 files:
1.rb
# encoding: UTF-8
require_relative "2.rb"
puts 'success'
2.rb
puts 'in the file'
You should see an error like this:
$ ruby 1.rb
1.rb:2:in `require_relative': cannot load such file -- D:/projects/blog/_posts-trials/rails/Test ?/2.rb (LoadError)
from 1.rb:2:in `<main>'
What's the output from ridk version
?
ruby: path: C:/Ruby30-x64 version: 3.0.3 platform: x64-mingw32 ruby_installer: package_version: 3.0.3-1 git_commit: 981867a msys2: path: C:\Ruby30-x64\msys64 cc: gcc (Rev2, Built by MSYS2 project) 11.2.0 sh: GNU bash, version 5.1.8(1)-release (x86_64-pc-msys) os: Microsoft Windows [Version 10.0.19044.1586]
I have run this script in that directory:
p __dir__.encoding
p Dir.pwd.encoding
puts
p __ENCODING__
p ''.encoding
p Encoding.default_external
p Encoding.default_internal
and the output is
#<Encoding:IBM437>
#<Encoding:Windows-1252>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
This is Windows 10 running in Parallels Desktop.
I suspect that just shows my ignorance wrt how file encoding works in Windows, and also what can a Ruby program assume when reading file/directory names.
Thanks for posting this here @fxn - I opened the issue here so that we can get closer to finding the correct place to fix this :) since the issues is clearly to do with Ruby + Windows, and not Rails.
This is what I get with codepage 65001 (UTF-8)
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
and with codepage 437
#<Encoding:IBM437>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
Can you please check your codepage by doing chcp
on the command line?
@mohits it says 437.
If I execute chpc 65001
, the output is:
#<Encoding:UTF-8>
#<Encoding:Windows-1252>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
Almost!
Hi @fxn - Yes, I think 437 (OEM - United States)
is the most common on English Windows. I think we need someone with a better understanding of locales on Windows to look at this issue.
Unsurprisingly, my simple test works on JRuby, of course - it successfully requires the file. Also, your code matches the output for chcp 65001 when run with JRuby even on a console that is CP-437.
$ jruby xfn.rb
#<Encoding:UTF-8>
#<Encoding:Windows-1252>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
Today I could not reproduce, trying more carefully.
The file system in my machine is in Windows-1252
. I created a directory called à
using the file explorer to make sure the encoding is honored. Inside that directory I created this test file and a dummy bar.rb
:
puts Encoding.find('filesystem')
p Dir.pwd.bytes[-1]
require_relative "bar"
This works, and the output is
Windows-1252
224
If you check the codes in Windows-1252
, you'll see 224 is, indeed, à
.
@mohits Can you reproduce using these steps? Maybe the directory was created with UTF-8 bytes for a non-UTF-8 file system?
However, ø
belongs to Windows-1252
(code 248) and the same script prints the expected byte, but fails to perform the require_relative
.
This is interesting, because both à
and ø
and non-ASCII, I would expect to succeed or fail in the same way.
@mohits what happens in your machine with à
?
hi @fxn - I am a bit confused now with the results I am seeing but I have progress to report (kind of..)
[1] I created this path:
$ cd D:\projects\blog_posts-trials\rails\Test-à
[2] I ran your code:
$ chcp
Active code page: 437
$ ruby 1.rb
UTF-8
160
On my system, it shows both it as UTF-8. I did a chcp 1252
and ran the same code and it also ran with the same result. This is where it gets interesting. I went to the folder with Test Ø
in the name, and ran the code again (still with CP-1252) and it ran successfully.
UTF-8
152
[3] I forced it to change to CP-437 again by doing chcp 437
and it failed but I got this:
UTF-8
152
1.rb:4:in `require_relative': cannot load such file -- D:/projects/blog/_posts-trials/rails/Rails Server Test ?/2.rb (LoadError)
from 1.rb:4:in `<main>'
It read the character properly (as 52) but failed on the require_relative.
[4] On the other hand, with cp-437
it, I ran it in the path with Test-à
and it worked.
$ ruby 1.rb
UTF-8
160
So, to summarise:
- CP-1252: Both paths worked. Got back UTF-8 and {160, 152} for the byte.
- CP-437: Both paths returned UTF-8 and {160, 152} for the byte. Path with Ø failed to require_relative.
I found this online: http://zuga.net/articles/text-ascii-vs-cp-1252-vs-cp-437/ that compares the code pages side by side.
CP-1252 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for graphical applications under Windows. CP-437 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for console applications under Windows.
In this, CP-1252 has the 2 characters at 224 and 248 respectively. CP-437 has à at 143 but does not have Ø at all.
@mohits Which Ruby version is that?
I discovered by testing related things in Zeitwerk that in Ruby 3.0 the file system encoding is assumed (unsure if the verb is correct) to be UTF-8. This issue in Redmine seems relevant.
@fxn - my bad. I should have included the ruby version: 3.0.3.
More information then:
Ruby 3.0.3 | Test Ø | CP-437 | UTF-8 | 152 | Fails to require
Ruby 2.7.4 | Test Ø | CP-437 | Windows-1252 | 216 | Fails to require
Ruby 2.6.8 | Test Ø | CP-437 | Windows-1252 | 216 | Fails to require
Ruby 3.0.3 | Test Ø | CP-1252 | UTF-8 | 152 | require_relative works
Ruby 2.7.4 | Test Ø | CP-1252 | Windows-1252 | 216 | require_relative works
Ruby 2.6.8 | Test Ø | CP-1252 | Windows-1252 | 216 | require_relative works
Ruby 3.0.3 | Test-à | CP-437 | UTF-8 | 160 | require_relative works
Ruby 2.7.4 | Test-à | CP-437 | Windows-1252 | 224 | require_relative works
Ruby 2.6.8 | Test-à | CP-437 | Windows-1252 | 224| require_relative works
Ruby 3.0.3 | Test-à | CP-1252 | UTF-8 | 160 | require_relative works
Ruby 2.7.4 | Test-à | CP-1252 | Windows-1252 | 224 | require_relative works
Ruby 2.6.8 | Test-à | CP-1252 | Windows-1252 | 224 | require_relative works
Yes, the issue on Redmine does seem relevant and might explain the result we see for the character code and encoding... but it appears that require_relative
uses some other encoding for the file path/ name?