crack icon indicating copy to clipboard operation
crack copied to clipboard

Crack::JSON is not parsing UTF-8 correctly

Open alhafoudh opened this issue 11 years ago • 1 comments

Hi, I have found out, that UTF-8 string parsing is not working correctly.

Sample input:

{"winstrom":{"widget":[{"name":"John Ďoe","age":"3.14"}]}}

I get this:

{"winstrom"=>{"widget"=>[{"name"=>"John Ďoe", " age"=>" 3.14"}]}}
                                               ^       ^

This fixes the problem

https://github.com/jnunemaker/crack/blob/master/lib/crack/json.rb#L46

# changing this
scanner, quoting, marks, pos, date_starts, date_ends = StringScanner.new(json), false, [], nil, [], []

# to this
scanner, quoting, marks, pos, date_starts, date_ends = StringScanner.new(json.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')), false, [], nil, [], []

Info found here:

  • http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8
  • http://robots.thoughtbot.com/post/42664369166/fight-back-utf-8-invalid-byte-sequences

I am not sure if this is a right solution to this problem. It looks like ruby StringScanner does not do well with UTF-8 strings.

Both gems crack and WebMock have this problem since WebMock uses stripped down version of crack's code.

alhafoudh avatar Oct 27 '13 23:10 alhafoudh

Not well with Chinese characters too!

showlovel avatar Jan 06 '15 04:01 showlovel