java icon indicating copy to clipboard operation
java copied to clipboard

Encoding problems

Open MarioSilv opened this issue 5 years ago • 2 comments

Greetings. I have a simple string object like this String blabla = "{"id": 8,"name": "SANTARÉM"}", in which i use JsonIterator.deserialize(blabla).get("name") and what i get is "SANTARɍ" and not "SANTARÉM"; I already try to check if JsonIterator has some configuration for enconding strings but didn't find anything.

Kind Regards,

MarioSilv avatar Mar 22 '19 11:03 MarioSilv

Hey @MarioSilv , I've faced the same issue in the past, but I'm not sure if it's the same situation as yours. I'm afraid your issue would be easily reproducible if you give us more context.

The best case scenario would be if you provide us a (really) small project containing a simple (straightforward) unit test reproducing this error.

miere avatar Apr 25 '19 11:04 miere

How to reproduce the problem: Set the JVM to a default encoding (such as US-ASCII) // Add a VM option: -Dfile.encoding=US-ASCII

  public static void main(String[] args) {
    String jsonWithVeryCommonCharacterInGerman = "{\"name\":\"Thomas Müller\"}";
    Any anyFromThatJson = JsonIterator.deserialize(jsonWithVeryCommonCharacterInGerman);
    String backToText = JsonStream.serialize(anyFromThatJson);
    System.out.println(backToText);
  }

prints {"name":"Thomas M?ller"}

What happens is this:

public static final Any deserialize(String input) {
        return deserialize(input.getBytes()); //<- Uses getBytes without the option to provide the encoding
}

This part is pretty easy to get around by providing the array of bytes already decoded: JsonIterator.deserialize(toBeDeserialized.getBytes(StandardCharsets.UTF_8)); //charset here just the example. You have to know the encoding you have your strings in.

The biggest problem is in the serialize method: JsonStream.serialize which creates a new String without providing the encoding var4 = new String(stream.buf, 0, stream.count);

leocampos avatar Nov 14 '19 20:11 leocampos