jackson-databind icon indicating copy to clipboard operation
jackson-databind copied to clipboard

When JSON contains special escape characters, some characters will be lost after parsing

Open git-chenhao opened this issue 5 months ago • 7 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Describe the bug

Version Information

2.16.1

Reproduction


public static void main(String[] args) throws JsonProcessingException {
        String body= "{\"name\":\"\\u2028澜\"}";
        System.out.println(body);
        ObjectMapper objectMapper = new ObjectMapper();
        Map map = objectMapper.readValue(body, Map.class);
        String json = objectMapper.writeValueAsString(map);
        System.out.println(json);
        System.out.println(body.equals(json));
    }


// {"name":"\u2028澜"}
// {"name":"
澜"}
// false

Expected behavior

No response

Additional context

No response

git-chenhao avatar Jan 16 '24 09:01 git-chenhao

There is no Jackson class that I know of called Jsons.

pjfanning avatar Jan 16 '24 09:01 pjfanning

There is no Jackson class that I know of called Jsons.

updated

git-chenhao avatar Jan 16 '24 09:01 git-chenhao

@pjfanning Can you help me take a look

git-chenhao avatar Jan 16 '24 09:01 git-chenhao

Unicode 2028 is a line separator - have you proved that the char in // {"name":"
澜"} is not unicode 2028? Your OS may display unicode 2028 as simple whitespace. There is no obligation for Jackson internals to escape the unicode chars in \u2028 format when it outputs them.

pjfanning avatar Jan 16 '24 10:01 pjfanning

@git-chenhao Please do NOT use System.out.println() for testing equality; what gets printed may or may not look the same.

In addition, there is no guarantee that Unicode escaping would be somehow preserved when reading: what matters are logical String values. So test would need to show that decoded String is not what it should be -- basically, that if you read String produced by mapper.writeValueAsString(value), it should result in same Map as the original String.

At this point a unit test would be needed to show incorrect handling of Unicode escapes.

cowtowncoder avatar Jan 17 '24 00:01 cowtowncoder

@git-chenhao请不要用于System.out.println()测试相等性;打印出来的内容可能看起来相同,也可能不一样。

此外,无法保证在读取时会以某种方式保留 Unicode 转义:重要的是逻辑 String 值。因此,测试需要表明解码后的 String 不是它应该的样子 - 基本上,如果您读取由 生成的 String ,它的结果应该与原始 Stringmapper.writeValueAsString(value)相同。Map

此时需要进行单元测试来显示 Unicode 转义的错误处理。

@cowtowncoder Don't worry about System. out. println(), because body.equals(json) is false I found that there is a JsonpCharacterEscapes class used to handle Unicode characters' 0x2028 'and' 0x2029 '. Will using this have any other effects?

git-chenhao avatar Jan 17 '24 03:01 git-chenhao

If the goal is to influence which characters are and which are not escaped, CharacterEscapes is the mechanism and yes, JsonpCharacterEscapes in particular might work here. Its only effect should be added escaping, and possibly some minor performance overhead. Otherwise no effect.

cowtowncoder avatar Jan 17 '24 19:01 cowtowncoder

No test to actually reproduce the problem (printing out results is not reliable means to show an issue), closing. May be re-opened/re-filed with a reproduction.

cowtowncoder avatar May 13 '24 01:05 cowtowncoder