exec-maven-plugin
exec-maven-plugin copied to clipboard
useMavenLogger breaks nōn-ASCII characters (i.e. 99.9̅% of Unicode)
I saw this in one of my scripts, but I reduced the script to just…
#!/bin/sh
echo mäh
exit 0
… for the reproduction of this.
$ mvn org.codehaus.mojo:exec-maven-plugin:exec@build-depsrcs@build-depsrcs -Dexec.useMavenLogger=false
[INFO] Scanning for projects...
[INFO]
[INFO] --------------------< org.evolvis.tartools:csvfile >--------------------
[INFO] Building org.evolvis.tartools:csvfile 3.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:3.0.0:exec (build-depsrcs) @ csvfile ---
mäh
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.262 s
[INFO] Finished at: 2020-06-21T17:48:46+02:00
[INFO] ------------------------------------------------------------------------
… vs…
$ mvn org.codehaus.mojo:exec-maven-plugin:exec@build-depsrcs@build-depsrcs -Dexec.useMavenLogger=true
[INFO] Scanning for projects...
[INFO]
[INFO] --------------------< org.evolvis.tartools:csvfile >--------------------
[INFO] Building org.evolvis.tartools:csvfile 3.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:3.0.0:exec (build-depsrcs) @ csvfile ---
[INFO] [main] mᅢᄂh
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.685 s
[INFO] Finished at: 2020-06-21T17:49:00+02:00
[INFO] ------------------------------------------------------------------------
Watch it completely destroy the umlaut.
IMHO, the conversion between script output and string passed to the logger SHOULD use the locale encoding, and if that is not possible or ASCII, it MUST use UTF-8.
Cc @hankolerd
The wide characters in question are U+FFC3 and U+FFA4, which is what happens when you use the line encoding of UTF-8 (\xC3\xA4), read it byte-for-byte and (wrong) sign-extend it to Unicode.
@hankolerd
Which version contains the fix? (From a user’s PoV, it’s better to keep bugreports open until they can actually install a fixed version.)
But thanks for fixing it.
@mirabilos It has been released in 3.1.1
OK.
As a testcase, trying to write a latin-1 mäh first, then a UTF-8 mäh.
With LC_ALL=C.UTF-8:
[INFO] m�h
[INFO] mäh
With LC_ALL=C:
[INFO] m?h
[INFO] m??h
So it’s definitely interpreting the bytes into wide characters then converting them back to current-locale multibyte characters. This is precisely the follow-up bug I already warned about… but it’s an improvement from the situation before, at least.
As Slawomir said, comments on commits are easily missed, I never saw that thread.
As for the issue, I'll repeat what he said that you can do a PR with a tentative fix.
Hi - this issue is also closed .... so if something is still wrong please:
- create new issue with description and better reproduce steps
- create PR with fix proposition
comments with closed issue can also be missed