php-serialize
php-serialize copied to clipboard
Hashes & Arrays with multibyte characters raise a TypeError
It seems that a couple of the latest commits break hashes / arrays with multibyte characters. I've written a failing test case, and the error message is the following:
test_assoc_multibyte(TestPhpSerialize) [test.rb:102]:
Exception raised:
<#<TypeError: Unable to unserialize type 'h'>>.
Re-basing to 32463ff6e837b7e003fca02a98af995177471924 seems to work, and since commits since then are performance related (and not bug fixes), fixing this wasn't a high priority for the app I'm using this library in. It should be fixed, though.. Can someone help me out? :smiley:
I'm pretty sure I've fixed the problem; PHP serialize use the number of bytes to read in a string (not characters). I fixed by forcing StringIO
into thinking that we have a binary-encoded string (ASCII), then forcing the string to be the correct encoding before returning.
Since this uses String#force_encoding
and String#encoding
, I know this breaks compatibility with 1.8.7, but it's working for me (tested on 1.9.3, 2.0.0, rbx 2.0.0-rc1, jruby 1.7.3)
Edit: It would be possible to bring back compatibility to 1.8.7 by using respond_to?
, but is it worth it?
I think we can drop Ruby 1.8 support at this point (ha!).
I'll need to look into this a bit more but this looks like a great change.
I have synchronized this patch to work with the latest commit e23fcfd.
https://github.com/jqr/php-serialize/compare/master...kaorukobo:php-serialize:keichan34-multibyte-array-hashes-again
I would be happy if this could be merged to master. 🙂
@jqr Thanks for merging! I no longer use this gem (nor do I use Ruby these days..) but it looks like now there's a method String#bytesize
that might be a bit more appropriate than using force_encoding (could have been there when I originally wrote this patch, it was 11 years ago, I have almost completely forgotten about this 😆 ). That said, if it works, don't break it... 🤣