goavro icon indicating copy to clipboard operation
goavro copied to clipboard

textual format does not correctly decode unicode strings

Open rogpeppe opened this issue 6 years ago • 0 comments

In JSON, the Unicode characters with code points 0x7f to 0xff can be encoded either as those characters directly, or with a Unicode escape sequence (e.g. \u00ff).

As such, JSON with either of these two alternatives should be treated the same by goavro.

Unfortunately, it does not do that. This code demonstrates the issue: https://play.golang.org/p/FxpmTjfmI15

This issue means that it's not possible to take JSON that's been encoded with or round-tripped through a normal JSON encoder and decode it correctly with goavro.

For example, this means that Avro JSON data that's piped through the jq command can be corrupted:

% echo '"\u00ff"' | jq .
"ÿ"

rogpeppe avatar Dec 11 '19 10:12 rogpeppe