redisjson-py icon indicating copy to clipboard operation
redisjson-py copied to clipboard

Unicode / Encoding issues

Open larswise opened this issue 6 years ago • 5 comments

I'm facing some encoding issues with the client; The problem are when using non ascii characters, more precisely æøåÆØÅ etc.

client.jsonset("test", Path.rootPath(), {'name': 'test111', 'items': []})

client.jsonget('test') --> {'name': 'test111', 'items': []}

client.jsonarrinsert('test', Path('.items'), 0, {'company': 'Åre', 'destination': 'ÅS', 'origin': 'LØR'})

client.jsonget('test')

This does not look correct? {"name":"test111","items":[{"company":"\u00c3\u0085re","destination":"\u00c3\u0085S","origin":"L\u00c3\u0098R"},]}

What i had expected: {"name":"test111","items":[{"company":"\u00c5re","destination":"\u00c5S","origin":"L\u00d8R"},]} or {"name":"test111","items":[{"company":"\xc5re","destination":"\xc5S","origin":"L\xd8R"},]}

If i save as strings they appear to get the correct encoding, but then my array elements are turned in to strings instead of objects

If I'm doing it wrong, I'd be greatful for any tips! :)

larswise avatar Feb 18 '19 15:02 larswise

Hello!

I just tried it too and same result with rejson-py. But I checked with the ReJSON CLI tool, to no avail. The problem stays the same. See the screenshot attached. screenshot 2019-02-21 at 00 51 17

I think the problem comes from the ReJSON internal encoding, and not the Python client. Maybe you could check if there is an open issue there or open one to see if they could help you ?

bentsku avatar Feb 20 '19 23:02 bentsku

I did manage to get around it:

In python I am able to restore the string by encoding as follows: somevalue.encode('utf-8').decode('unicode-escape').encode('latin1').decode('utf-8')

and similarly in .NET after fetching with JSON.MGET

		public static string GetEncoded(params string[] strings)
		{
			var lat1 = System.Text.Encoding.GetEncoding("iso-8859-1");
			Regex rx = new Regex(@"\\[uU]([0-9A-Fa-f]{4})");
			var combined = string.Join(",", strings);
			var result = rx.Replace(combined, match => ((char)Int32.Parse(match.Value.Substring(2), System.Globalization.NumberStyles.HexNumber)).ToString());
			var lat1bytes = lat1.GetBytes(result);
			return System.Text.Encoding.UTF8.GetString(lat1bytes);
		}

larswise avatar Mar 02 '19 01:03 larswise

Having problems as well.

JSON.SET foo . '"bãr"'
OK
JSON.GET foo .
"\"b\\u00c3\\u00a3r\""

When I remove the duplicate \ and decode the result bãr

mschipperheyn avatar Apr 08 '19 03:04 mschipperheyn

I believe there is now an option to decode special character with a no-escape option in the JSON.GET command as said in the replies of this issue. Maybe we could add it as an option for the python command? I can try to add it if wanted.

RedisJSON/RedisJSON#98

bentsku avatar Jul 22 '19 22:07 bentsku

@bentsku if you can submit a PR that will be great

gkorland avatar Jul 23 '19 08:07 gkorland