crack
crack copied to clipboard
Crack::JSON is not parsing UTF-8 correctly
Hi, I have found out, that UTF-8 string parsing is not working correctly.
Sample input:
{"winstrom":{"widget":[{"name":"John Ďoe","age":"3.14"}]}}
I get this:
{"winstrom"=>{"widget"=>[{"name"=>"John Ďoe", " age"=>" 3.14"}]}}
^ ^
This fixes the problem
https://github.com/jnunemaker/crack/blob/master/lib/crack/json.rb#L46
# changing this
scanner, quoting, marks, pos, date_starts, date_ends = StringScanner.new(json), false, [], nil, [], []
# to this
scanner, quoting, marks, pos, date_starts, date_ends = StringScanner.new(json.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')), false, [], nil, [], []
Info found here:
- http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8
- http://robots.thoughtbot.com/post/42664369166/fight-back-utf-8-invalid-byte-sequences
I am not sure if this is a right solution to this problem. It looks like ruby StringScanner does not do well with UTF-8 strings.
Both gems crack and WebMock have this problem since WebMock uses stripped down version of crack's code.
Not well with Chinese characters too!