goavro
goavro copied to clipboard
textual format does not correctly decode unicode strings
In JSON, the Unicode characters with code points 0x7f to 0xff can be encoded either as those characters directly, or with a Unicode escape sequence (e.g. \u00ff).
As such, JSON with either of these two alternatives should be treated the same by goavro.
Unfortunately, it does not do that. This code demonstrates the issue: https://play.golang.org/p/FxpmTjfmI15
This issue means that it's not possible to take JSON that's been encoded with or round-tripped through a normal JSON encoder and decode it correctly with goavro.
For example, this means that Avro JSON data that's piped through the jq command can be corrupted:
% echo '"\u00ff"' | jq .
"ÿ"