yajl-ruby
yajl-ruby copied to clipboard
Yajl can't decode its own output for some byte packed strings
I have some code that produces a packed array of bytes and when I try to encode it then parse it I get breakages.
bytes.to_a.inspect => [128, 85, 1, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 152, 2, 0, 0, 2, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 18, 8, 0, 0, 0, 0, 0, 0]
Yajl::Encoder.encode(bytes) => "?U\u0001\u0000\u0000\u0000\u0000\u0000\u0003\u0000\u0000\u0000\u0000\u0000\u0000\u0000?\u0002\u0000\u0000\u0002\u0000\u0000\u0000\u0000\u0004\u0000\u0000\u0000\u0000\u0000\u0000\u0012\b\u0000\u0000\u0000\u0000\u0000\u0000"
Yajl::Parser.parse(Yajl::Encoder.encode(bytes)) lexical error: invalid bytes in UTF8 string. "?U\u0001\u0000\u0000\u0000\u000 (right here) ------^
The above doesn't come out right but the pointer is to the 'U' after the '?'
I'm not entire sure it seems that it never handles the output well when the first bytes are [128,85].
This is in REE 1.8.7. Any ideas as to what's might be wrong here?
That looks like binary data? If so, it's not allowed in-line in a JSON stream. That said, yajl 1.0 (what's bundled in yajl-ruby currently) doesn't do any validation of string data fed to it (you could give it the raw bytes from a gif and it'll just append it as-is to the output buffer). yajl 2.0 added UTF-8 validation to the encoder which would help here. If this is binary data, and you absolutely need to use JSON - I'd base64 encode it or something first to make sure it's ascii. Otherwise maybe take a look at MessagePack?