awkenough
awkenough copied to clipboard
Unicode
Given this JSON input from Wikimedia API:
{"continue":{"rvcontinue":"20200405152120|949275285","continue":"||"},"query":{"pages":{"63572550":{"pageid":63572550,"ns":0,"title":"Romanian major","revisions":[{"user":"Enc\u00e6clop\u00e6dius"}]}}}}
And this Awk command:
awk -ijson 'BEGIN{jsonin=readfile("json.txt"); print jsonin; gsub("\\\\u", "\\\\u", jsonin); if( query_json(jsonin, jsona) >= 0) {print jsona["query","pages","63572550","revisions","1","user"]}}'
It should print: Enc\u00e6clop\u00e6dius But instead: Enc\u00e6clopu00e6dius
It converted the first unicode character but not the second. I track it down to the line:
if (++k % 2 == 1) v = v "\\"
In function parse_json() and resolved it by changing to:
v = v "\\"
I don't know if that will break something else. Also required to do
gsub("\\\\u", "\\\\u", jsonin)
In the original awk command for the \ to expand correctly, I don't know why.