exec-maven-plugin useMavenLogger breaks nōn-ASCII characters (i.e. 99.9̅% of Unicode)

I saw this in one of my scripts, but I reduced the script to just…

#!/bin/sh
echo mäh
exit 0

… for the reproduction of this.

$ mvn org.codehaus.mojo:exec-maven-plugin:exec@build-depsrcs@build-depsrcs -Dexec.useMavenLogger=false
[INFO] Scanning for projects...
[INFO]
[INFO] --------------------< org.evolvis.tartools:csvfile >--------------------
[INFO] Building org.evolvis.tartools:csvfile 3.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:3.0.0:exec (build-depsrcs) @ csvfile ---
mäh
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  1.262 s
[INFO] Finished at: 2020-06-21T17:48:46+02:00
[INFO] ------------------------------------------------------------------------

… vs…

$ mvn org.codehaus.mojo:exec-maven-plugin:exec@build-depsrcs@build-depsrcs -Dexec.useMavenLogger=true
[INFO] Scanning for projects...
[INFO] 
[INFO] --------------------< org.evolvis.tartools:csvfile >--------------------
[INFO] Building org.evolvis.tartools:csvfile 3.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- exec-maven-plugin:3.0.0:exec (build-depsrcs) @ csvfile ---
[INFO] [main] mￃﾤh
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  1.685 s
[INFO] Finished at: 2020-06-21T17:49:00+02:00
[INFO] ------------------------------------------------------------------------

Watch it completely destroy the umlaut.

IMHO, the conversion between script output and string passed to the logger SHOULD use the locale encoding, and if that is not possible or ASCII, it MUST use UTF-8.

Cc @hankolerd

Jun 21 '20 15:06 mirabilos

The wide characters in question are U+FFC3 and U+FFA4, which is what happens when you use the line encoding of UTF-8 (\xC3\xA4), read it byte-for-byte and (wrong) sign-extend it to Unicode.

@hankolerd

Jun 21 '20 15:06 mirabilos

Which version contains the fix? (From a user’s PoV, it’s better to keep bugreports open until they can actually install a fixed version.)

But thanks for fixing it.

Jul 10 '23 23:07 mirabilos

@mirabilos It has been released in 3.1.1

Nov 20 '23 22:11 jebeaudet

OK.

As a testcase, trying to write a latin-1 mäh first, then a UTF-8 mäh.

With LC_ALL=C.UTF-8:

[INFO] m�h
[INFO] mäh

With LC_ALL=C:

[INFO] m?h
[INFO] m??h

So it’s definitely interpreting the bytes into wide characters then converting them back to current-locale multibyte characters. This is precisely the follow-up bug I already warned about… but it’s an improvement from the situation before, at least.

Nov 22 '23 17:11 mirabilos

As Slawomir said, comments on commits are easily missed, I never saw that thread.

As for the issue, I'll repeat what he said that you can do a PR with a tentative fix.

Nov 22 '23 17:11 jebeaudet

Hi - this issue is also closed .... so if something is still wrong please:

create new issue with description and better reproduce steps
create PR with fix proposition

comments with closed issue can also be missed

Nov 22 '23 19:11 slawekjaranowski

exec-maven-plugin exec-maven-plugin copied to clipboard

useMavenLogger breaks nōn-ASCII characters (i.e. 99.9̅% of Unicode)

exec-maven-plugin
exec-maven-plugin copied to clipboard