ruru icon indicating copy to clipboard operation
ruru copied to clipboard

RString default encoding is ASCII-8Bit

Open mjc-gh opened this issue 8 years ago • 10 comments

I made a small example to show the "issue". It is available here: https://github.com/mikeycgto/ruru-rstring-encoding#example.

While this isn't incorrect it does feel a little odd since in modern Ruby and Rust both default to UTF-8 encoding.

Perhaps ruby-sys should expose rb_utf8_str_new as well? The RString struct could then maybe have a new_from_utf8 function which will ultimately call this Ruby C function. I can make the necessary PRs but I wanted some feedback first.

mjc-gh avatar Feb 24 '17 15:02 mjc-gh

Created two commits to deal with this issue:

mjc-gh avatar Feb 27 '17 20:02 mjc-gh

#71 will be merged after we release new version of ruby-sys

d-unsed avatar May 24 '17 17:05 d-unsed

Any update on releasing a new version of ruby-sys with steveklabnik/ruby-sys/pull/23 merged? Thanks!

mjc-gh avatar Jul 13 '17 15:07 mjc-gh

I've published ruby-sys 0.3.0

steveklabnik avatar Jul 26 '17 17:07 steveklabnik

On encoding I'd be interested if we could implement the same encoding support that Ruby has. I'm playing around with some ideas. The main issue to test against is this test https://github.com/ruby/spec/blob/29b472cf57946ce271533fc6e03716908a313aaf/core/file/basename_spec.rb#L162-L166 which simply tests the same encoding goes in and out.

danielpclark avatar Sep 12 '17 13:09 danielpclark

I suppose this could be closed per #71, yeah?

turboladen avatar Dec 20 '17 04:12 turboladen

I have one more question regarding encodings. Since Ruby by default creates a string with UTF-8 encoding ('str'.encoding => #<Encoding:UTF-8>), what do you think about changing the default behavior on the ruru side?

I mean the following steps:

  1. Create a new method RString::new_ascii which returns ascii-encoded string (behaves the same as current RString::new using rb_str_new)

  2. Change RString::new to return a UTF8-encoded string (behaves the same as current RString::new_utf8 using rb_utf8_str_new) and remove RString::new_utf8.

Then ruru and ruby will have the same default behavior. This can be implemented in 0.10.0

d-unsed avatar Dec 20 '17 10:12 d-unsed

and remove RString::new_utf8

I think changing the RString::new method to expected behavior is a good idea. But I think you should keep the RString::new_utf8 method around as it will help people who want to be more explicit with their code. Anyone currently using RString::new_utf8 will be better off if it stays.

danielpclark avatar Dec 20 '17 11:12 danielpclark

I agree with making the default for RString::new be a UTF8 string since that's how Ruby behaves.

I think keeping RString::new_utf8 around is good idea as well. Perhaps we could also add RString::new_ascii which would returns an a ASCII-8bit encode string.

mjc-gh avatar Dec 20 '17 16:12 mjc-gh

I agree with making the default for RString::new be a UTF8 string since that's how Ruby behaves.

This totally makes sense to me.

turboladen avatar Jan 04 '18 00:01 turboladen