antisamy icon indicating copy to clipboard operation
antisamy copied to clipboard

Update properties bundle to UTF-8

Open spassarop opened this issue 9 months ago • 4 comments

I have been modifying the properties files of the bundle for error messages saving them as UTF-8 without having explicit \uXXXX format on non-ASCII characters. However I am not sure these changes are enough because of the messages I get printed on standard output with broken characters.

I need someone else with more practical knowledge about configuration+runtime on Java to point out what else is needed in this PR. I tried to follow what was discussed on #456.

spassarop avatar Mar 09 '25 16:03 spassarop

However I am not sure these changes are enough because of the messages I get printed on standard output with broken characters.

Can you be more elaborate. How are you trying to print these out on stdout? But just running something like (say):

$ cat src/main/resources/AntiSamy_de_DE.properties

or via some specific JUnit test or what? Jut pure speculation, but there may be environment variables set that potentially affect the output. For example, on Linux Mint 21.3:

$ env | grep -i LANG
GDM_LANG=en_US
LANG=en_US.UTF-8
LANGUAGE=en_US

I'm not sure if any of those affect what the output looks like, but I'd guess that $LANG potentially does. If you can describe one test that you are running where it's not give the expected results and you list the expected output, I can see if I can provide any additional insight. If not, I maybe can ask Matt, as he seems to understand code points way better than I do.

kwwall avatar Mar 09 '25 21:03 kwwall

@spassarop - are you going to try to research further/address @kwwall's comments?

davewichers avatar Mar 24 '25 14:03 davewichers

I am using Windows :I

I saved all properties files with VSCode as UTF-8 after opening them explicitly as ISO-8859-1, some of them required manual changes so I could visually understand they were being saved correctly. It was not necessary for all languages.

Ended up running this with JUnit to get the characters right:

    Properties properties = new Properties();
    try (java.io.InputStreamReader reader = new java.io.InputStreamReader(
            getClass().getClassLoader().getResourceAsStream("AntiSamy_it_IT.properties"),
            java.nio.charset.StandardCharsets.UTF_8)) {
      properties.load(reader);
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
    properties.forEach((key, value) -> System.out.println(key + " = " + value));

That printed OK on the IntelliJ IDEA console. But then I retried my initial approach of printing the messages bundle with the right nomenclature in the parameters and it also worked:

    messages = ResourceBundle.getBundle("AntiSamy", new Locale("zh", "CN"));
    Enumeration<String> keys = messages.getKeys();
    while (keys.hasMoreElements()) {
      String key = keys.nextElement();
      String value = messages.getString(key);
      System.out.println(key + " = " + value);
    }

image

I call it a win. But the I don't like the whole works-on-my-machine results. Is there something better to test?

spassarop avatar Mar 28 '25 23:03 spassarop

@spassarop wrote:

I call it a win. But the I don't like the whole works-on-my-machine results. Is there something better to test?

Ha! That's Java for you. Write one, test everywhere.

Seriously, I think the only thing you can do is to ask @GodMeowIceSun to test it via your revised code and JUnit test. Since he (?) is the one who created issue #456 , who better to verify it? Another alternative is if you have a mailing list or OWASP Slack channel, maybe as someone who has their Java runtime environment configured for Chinese characters to take a crack at testing it. But beyond that, I've got nothing. I think you've already done due diligence on this.

kwwall avatar Mar 29 '25 18:03 kwwall