i18n icon indicating copy to clipboard operation
i18n copied to clipboard

[BUG] UTF-8 YAML files with accents in version 1.9.0 raise incompatible character encodings: UTF-8 and ASCII-8BIT

Open kivanio opened this issue 3 years ago • 21 comments

For long time I have a code like this:

STATES = I18n.t('states').with_indifferent_access.freeze

The yaml file is in UTF-8 and it has accents in some words:

pt-BR:
  states:
    Acre: AC
    Amapá: AP
    Ceará: CE
    Piauí: PI
    Paraná: PR

with new release 1.9.0 It starts failing in our CI in a lot of places:

ActionView::Template::Error incompatible character encodings: UTF-8 and ASCII-8BIT
Failure/Error: = f.select(:state, City::STATES,

ActionView::Template::Error:
  incompatible character encodings: UTF-8 and ASCII-8BIT
./app/views/customers/_form.html.slim:86:in `block in 

Downgrade to 1.8.11 make everything works again.

With 1.9.0 use rails console it loads like:

rails c
Running via Spring preloader in process 4964
Loading development environment (Rails 6.1.4.4)
[1] pry(main)> I18n.t('states').with_indifferent_access.freeze
=> {"Acre"=>"AC",
 "Alagoas"=>"AL",
 "Amazonas"=>"AM",
 "Amap\xC3\xA1"=>"AP",
 "Bahia"=>"BA",
 "Cear\xC3\xA1"=>"CE",
 "Distrito Federal"=>"DF",
 "Esp\xC3\xADrito Santo"=>"ES",
 "Goi\xC3\xA1s"=>"GO",
 "Maranh\xC3\xA3o"=>"MA",
 "Minas Gerais"=>"MG",
 "Mato Grosso do Sul"=>"MS",
 "Mato Grosso"=>"MT",
 "Par\xC3\xA1"=>"PA",
 "Para\xC3\xADba"=>"PB",
 "Pernambuco"=>"PE",
 "Piau\xC3\xAD"=>"PI",
 "Paran\xC3\xA1"=>"PR",
 "Rio de Janeiro"=>"RJ",
 "Rio Grande do Norte"=>"RN",
 "Rond\xC3\xB4nia"=>"RO",
 "Roraima"=>"RR",
 "Rio Grande do Sul"=>"RS",
 "Santa Catarina"=>"SC",
 "Sergipe"=>"SE",
 "S\xC3\xA3o Paulo"=>"SP",
 "Tocantins"=>"TO"}

With 1.8.11 use rails console it loads like:

rails c
Running via Spring preloader in process 3411
Loading development environment (Rails 6.1.4.4)
[1] pry(main)> I18n.t('states').with_indifferent_access.freeze
=> {"Acre"=>"AC",
 "Alagoas"=>"AL",
 "Amazonas"=>"AM",
 "Amap\xC3\xA1"=>"AP",
 "Bahia"=>"BA",
 "Cear\xC3\xA1"=>"CE",
 "Distrito Federal"=>"DF",
 "Esp\xC3\xADrito Santo"=>"ES",
 "Goi\xC3\xA1s"=>"GO",
 "Maranh\xC3\xA3o"=>"MA",
 "Minas Gerais"=>"MG",
 "Mato Grosso do Sul"=>"MS",
 "Mato Grosso"=>"MT",
 "Par\xC3\xA1"=>"PA",
 "Para\xC3\xADba"=>"PB",
 "Pernambuco"=>"PE",
 "Piau\xC3\xAD"=>"PI",
 "Paran\xC3\xA1"=>"PR",
 "Rio de Janeiro"=>"RJ",
 "Rio Grande do Norte"=>"RN",
 "Rond\xC3\xB4nia"=>"RO",
 "Roraima"=>"RR",
 "Rio Grande do Sul"=>"RS",
 "Santa Catarina"=>"SC",
 "Sergipe"=>"SE",
 "S\xC3\xA3o Paulo"=>"SP",
 "Tocantins"=>"TO"}

The output seems exactly the same. Is that a bug in new version?

Probably something between new version and load in rails. This is just a sample I have others files with accents and all of them are raising same exception.

Versions of i18n, rails, and anything else you think is necessary

ruby: 3.0.3 i18n: 1.9.0 rails: 6.1.4.3 rspec-rails: 5.1.0 rspec: 3.10.0

and # frozen_string_literal: true in ruby files.

kivanio avatar Jan 27 '22 18:01 kivanio

Hello, thank you for the detailed reproduction steps.

I was unable to reproduce this issue using:

  • Ruby 3.0.0
  • i18n 1.9.1
  • Rails 6.1.4.3

(The RSpec versions are highly likely to be irrelevant to this issue)

I will try with Ruby 3.0.3 now.

radar avatar Jan 27 '22 22:01 radar

I am unable to reproduce this issue with 3.0.3. Could you please put a Rails app that does reproduce this issue on GitHub so that I can clone it down and investigate?

radar avatar Jan 27 '22 22:01 radar

We have a similar issue. I've created this repo to demonstrate the difference.

Using i18n 1.8.11 the keys are symbolized with UTF-8 encoding. With 1.9.1 it's ASCII.

gem 'i18n', "< 1.9.0"
I18n.t("foobar").keys => [:ö]
I18n.t("foobar").keys.first.encoding => #<Encoding:UTF-8>

I18n.t("foobar").values.first.encoding => #<Encoding:UTF-8>
gem 'i18n', "> 1.9.0"
I18n.t("foobar").keys => [:"\xC3\xB6"]
I18n.t("foobar").keys.first.encoding => #<Encoding:ASCII-8BIT>

I18n.t("foobar").values.first.encoding => #<Encoding:UTF-8>

https://github.com/joergschiller/i18n_issue_606/blob/293513b938310239e06bf4062be975e0ec772fc0/Gemfile#L6-L12

(With Ruby 3.0.3)

joergschiller avatar Jan 27 '22 22:01 joergschiller

Thank you @joergschiller I am not alone 🙏🏻 I was going to make a repo and you saved me. I was believing that It was something desired in new version but now I think we have a bug.

kivanio avatar Jan 28 '22 01:01 kivanio

Thank you @joergschiller. I can reproduce this issue now with your repository. I'll find the commit that broke it.

radar avatar Jan 28 '22 03:01 radar

Commit that breaks this behaviour is 0fda789ea745cd462658a8948ee085201aba5c6f, as discovered through a git bisect:

 ~/code/gems/i18n   v1.9.0~7^2 (bisect)  bad
0fda789ea745cd462658a8948ee085201aba5c6f is the first bad commit
commit 0fda789ea745cd462658a8948ee085201aba5c6f
Author: Paarth Madan <[email protected]>
Date:   Wed Nov 3 12:33:12 2021 -0400

    Symbolize and freeze keys when loading from YAML

 lib/i18n/backend/base.rb    | 2 +-
 test/backend/simple_test.rb | 7 ++++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

radar avatar Jan 28 '22 03:01 radar

@paarthmadan Do you have any time today to investigate this one?

radar avatar Jan 28 '22 03:01 radar

Looked into where the unsafe_load_file method was coming from bootsnap. cc @casperisfine.

/Users/ryan.bigg/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/bootsnap-1.10.2/lib/bootsnap/compile_cache/yaml.rb:203:in `unsafe_load_file': wrong number of arguments (given 0, expected 1+) (ArgumentError)

If I remove bootsnap from this test application, the issue goes away:

irb(main):001:0> I18n.t(:foobar).keys[0]
=> :ö

@casperisfine: Would you like me to create an issue on the Bootsnap repo page for this one?

radar avatar Jan 28 '22 04:01 radar

👀

casperisfine avatar Jan 28 '22 07:01 casperisfine

Ah damn it, I know what the problem is. It's because Bootsnap uses msgpack to accelarate YAML parsing, and MessagePack use an API that doesn't preserve symbols encoding properly. See an issue I opened a while ago https://github.com/msgpack/msgpack-ruby/pull/211

Let me go over my old research see how we could sidestep this in bootsnap. I'll update here ASAP.

casperisfine avatar Jan 28 '22 07:01 casperisfine

Ok, so the bug is actually in msgpack, I opened a PR here: https://github.com/msgpack/msgpack-ruby/pull/246

You can apply the patch with:

gem 'msgpack', github: 'Shopify/msgpack-ruby', branch: 'symbolize-keys-fix-encoding'
>> I18n.t(:foobar)
=> {:ö=>"ü"}

Alternatively, if you'd rather not run a gem branch, you can disable Bootsnap YAML caching. Sorry for the bug :/

casperisfine avatar Jan 28 '22 09:01 casperisfine

Thank you very much for the deep investigation here :)

radar avatar Jan 28 '22 22:01 radar

@casperisfine Is there something we could do to get that msgpack/msgpack-ruby PR merged + released? Is there a bribe of cookies that needs to be made here?

radar avatar Feb 03 '22 23:02 radar

Not that I know of. He did acknowledge seeing some of my other PRs, he's probably busy.

I don't like to nag maintainers, but I see the same bug was reported again yesterday, so I'll ping him on that on PR just this once.

casperisfine avatar Feb 04 '22 07:02 casperisfine

He said early next week hopefully.

casperisfine avatar Feb 04 '22 09:02 casperisfine

If you are interested, I could add some feature checking in I18n, so sidestep the optimization when it's bogus.

casperisfine avatar Feb 04 '22 09:02 casperisfine

@casperisfine That could be a good workaround in the meantime, I think. Could you please find out what that would take??

radar avatar Feb 04 '22 23:02 radar

Uh, I just come back to this issue, and I was certain I already answered :/

So first the fix was merged upstream today, but not sure when there will be a release.

For the workaround, we could simply test wether the bug is present with something like:

√: ~

and then from ruby:

if YAML.unsafe_load_file("test.yml", symbolize_names: true).keys.first.encoding == Encoding::UTF_8
  # it works we can use the optimization
  ...

casperisfine avatar Feb 08 '22 22:02 casperisfine

Reviewing this again, I think we'll just wait for a new msgpack release to happen, and then advise people who encounter this issue to upgrade to that new version.

I'll be leaving this issue open until that new version is out.

radar avatar Feb 14 '22 07:02 radar

msgpack 1.4.5 was released a few hours ago and should solve this issue: https://rubygems.org/gems/msgpack/versions/1.4.5

casperisfine avatar Feb 15 '22 13:02 casperisfine

Thank you all for the hard work! 👏🏻

kivanio avatar Feb 15 '22 14:02 kivanio