php-serialize icon indicating copy to clipboard operation
php-serialize copied to clipboard

Hashes & Arrays with multibyte characters raise a TypeError

Open keichan34 opened this issue 11 years ago • 2 comments

It seems that a couple of the latest commits break hashes / arrays with multibyte characters. I've written a failing test case, and the error message is the following:

test_assoc_multibyte(TestPhpSerialize) [test.rb:102]:
Exception raised:
<#<TypeError: Unable to unserialize type 'h'>>.

Re-basing to 32463ff6e837b7e003fca02a98af995177471924 seems to work, and since commits since then are performance related (and not bug fixes), fixing this wasn't a high priority for the app I'm using this library in. It should be fixed, though.. Can someone help me out? :smiley:

keichan34 avatar Mar 16 '13 06:03 keichan34

I'm pretty sure I've fixed the problem; PHP serialize use the number of bytes to read in a string (not characters). I fixed by forcing StringIO into thinking that we have a binary-encoded string (ASCII), then forcing the string to be the correct encoding before returning.

Since this uses String#force_encoding and String#encoding, I know this breaks compatibility with 1.8.7, but it's working for me (tested on 1.9.3, 2.0.0, rbx 2.0.0-rc1, jruby 1.7.3)

Edit: It would be possible to bring back compatibility to 1.8.7 by using respond_to?, but is it worth it?

keichan34 avatar May 15 '13 10:05 keichan34

I think we can drop Ruby 1.8 support at this point (ha!).

I'll need to look into this a bit more but this looks like a great change.

jqr avatar Jun 29 '17 15:06 jqr

I have synchronized this patch to work with the latest commit e23fcfd.

https://github.com/jqr/php-serialize/compare/master...kaorukobo:php-serialize:keichan34-multibyte-array-hashes-again

I would be happy if this could be merged to master. 🙂

kaorukobo avatar Feb 21 '24 07:02 kaorukobo

@jqr Thanks for merging! I no longer use this gem (nor do I use Ruby these days..) but it looks like now there's a method String#bytesize that might be a bit more appropriate than using force_encoding (could have been there when I originally wrote this patch, it was 11 years ago, I have almost completely forgotten about this 😆 ). That said, if it works, don't break it... 🤣

keichan34 avatar Feb 25 '24 23:02 keichan34