redis-js icon indicating copy to clipboard operation
redis-js copied to clipboard

Unicode converting from UTF-16 to UTF-32 then failing on JSON.parse

Open cathykc opened this issue 2 years ago • 5 comments

We're storing an object via hset that contains the following unicode character (an emoji) \ud83e\udef6. When we retrieve the value via hgetall we get \U0001faf6 back which is the UTF-32 encoding of the original character. This causes the deserializer to fail.

Is there a reason that these characters are getting converted from different encoding types? And what would be the recommended work around here?

Error we receive back from hgetall:

FetchError: invalid json response body at [URL] reason: Unexpected token U in JSON at position 330

We then fetch the response using /hgetall/ REST API to inspect it - (res.text()) and it's telling us the unicode character was transformed to UTF-32

cathykc avatar Jul 05 '22 20:07 cathykc

Perhaps the solution here is to be able to define a custom serializer? I notice we can do that for deserializers.

cathykc avatar Jul 05 '22 21:07 cathykc

Hmm, we're not doing anything other than calling JSON.parse when deserializing. I'm not sure what's causing this but I'll try it myself.

Yeah, can you try it without deserialization:

new Redis({
  // ...
  automaticDeserialization: false
})

that will give you the string response straight from redis.

If that also converts the string in to UTF-32, then we'll need to fix it in our redis server.

chronark avatar Jul 06 '22 08:07 chronark

Here's a min repro case

const key = "repro";

// Unicode goes in as \ud83e\udef6
await redis.hset(key, { key: { jsonKey: "Some text \ud83e\udef6" } });
try {
  // This throws an error: FetchError: invalid json response bod
  const userInfo =  await redis.hgetall(key);
} catch (e) {
  const res = await fetch(`${process.env.UPSTASH_REDIS_REST_URL}/hgetall/${key}`, {
    headers: {
      Authorization: `Bearer ${process.env.UPSTASH_REDIS_REST_TOKEN}`,
    },
  });
  console.error(await res.text());
  // Unicode returned as \U0001faf6 {"result":["key","{\"jsonKey\":\"Some text \U0001faf6\"}"]}
}

Using automaticDeserialization: false results in the same!

cathykc avatar Jul 06 '22 19:07 cathykc

Thanks, I will try to debug this asap, but I can't promise you anything until the end of next week.

chronark avatar Jul 07 '22 07:07 chronark

Hey @cathykc I did some more testing and I belive it is actually the JSON.stringify method, that causes the issue.

The easiest way around that is to escape the backslashes in your value.

import { Redis } from "@upstash/redis";
import "isomorphic-fetch";

const value = "Some text \\ud83e\\udef6";

async function main() {
  const redis = Redis.fromEnv()Ï

  await redis.hset("upstash", { value });
  console.log(await redis.hgetall("upstash"));
}
main();

I tried some other ways to include a custom serializer but that didn't solve the problem unfortunately. So I hope this works for you.

chronark avatar Jul 18 '22 10:07 chronark

I'll close this due to inactivity, please reopen if you have more questions

chronark avatar Aug 22 '22 06:08 chronark

Hey! Sorry for not replying - we ended up solving this by with encodeURI.

Setting - encodeURI(JSON.stringify(VALUE_TO_STORE_IN_REDIS)) Getting -JSON.parse(decodeURI(VALUE_RETREIVED_FROM_REDIS) - this returns the original unicode characters

Thanks for looking into this!

cathykc avatar Aug 22 '22 18:08 cathykc

Running into the same issue!

Very easy to reproduce, just run:

➜ SET bugtest 🫠
OK
➜ GET bugtest
Bad escaped character in JSON at position 12

cvle avatar Sep 29 '22 16:09 cvle

running into the same issue as @cvle, on the backend I replaced upstash/redis with ioredis because setting automaticDeserialization = false did not help. but now on the frontend the issue is the same, but on the frontend I can't use anything other than upstash restapi...

@chronark do you think there will be an option in the future for the rest api to return un-desirialized (raw) data from redis?

pavvell avatar Jul 03 '23 14:07 pavvell

The problem is that the http API speaks json, and this we need to call json.parse on it, which automatically messes with the encoding.

To prevent this we have started to encode the response in base64 and you can opt out of automatic deserilization

I think this might be on the server side and we will take another look into it

cc @mdogan

chronark avatar Jul 03 '23 17:07 chronark