awkenough icon indicating copy to clipboard operation
awkenough copied to clipboard

Unicode

Open greencardamom opened this issue 4 years ago • 0 comments

Given this JSON input from Wikimedia API:

{"continue":{"rvcontinue":"20200405152120|949275285","continue":"||"},"query":{"pages":{"63572550":{"pageid":63572550,"ns":0,"title":"Romanian major","revisions":[{"user":"Enc\u00e6clop\u00e6dius"}]}}}}

And this Awk command:

awk -ijson 'BEGIN{jsonin=readfile("json.txt"); print jsonin; gsub("\\\\u", "\\\\u", jsonin); if( query_json(jsonin, jsona) >= 0) {print jsona["query","pages","63572550","revisions","1","user"]}}'

It should print: Enc\u00e6clop\u00e6dius But instead: Enc\u00e6clopu00e6dius

It converted the first unicode character but not the second. I track it down to the line: if (++k % 2 == 1) v = v "\\" In function parse_json() and resolved it by changing to: v = v "\\"

I don't know if that will break something else. Also required to do gsub("\\\\u", "\\\\u", jsonin) In the original awk command for the \ to expand correctly, I don't know why.

greencardamom avatar Apr 10 '20 14:04 greencardamom