maven-invoker-plugin icon indicating copy to clipboard operation
maven-invoker-plugin copied to clipboard

[MINVOKER-351] Escape special xml character in junit report

Open slawekjaranowski opened this issue 1 year ago • 33 comments
trafficstars

  • use StringEscapeUtils.escapeXml10 form commons-text

https://issues.apache.org/jira/browse/MINVOKER-351

slawekjaranowski avatar May 16 '24 20:05 slawekjaranowski

I don't understand this because it looks logically wrong. The model does not care how it is marshaled, shouldn't the XML writer do this?

michael-o avatar May 16 '24 20:05 michael-o

@michael-o there is a problem with special chars ... they are not escaped / removed by the plexus-xml

slawekjaranowski avatar May 16 '24 21:05 slawekjaranowski

@michael-o there is a problem with special chars ... they are not escaped / removed by the plexus-xml

Is there an upstream issue for this?

michael-o avatar May 17 '24 07:05 michael-o

Read the JIRA issue. I assume we are talking about chars outside of https://en.wikipedia.org/wiki/Valid_characters_in_XML#XML_1.0?

michael-o avatar May 17 '24 10:05 michael-o

...obviously it is: https://github.com/apache/commons-text/blob/46c7a93ed0b3e43c369d89d15495fb2bce0e693b/src/main/java/org/apache/commons/text/StringEscapeUtils.java#L232-L274

michael-o avatar May 17 '24 10:05 michael-o

Exactly similar implementation should be in plexus-xml

https://github.com/codehaus-plexus/plexus-xml/blob/master/src/main/java/org/codehaus/plexus/util/xml/PrettyPrintXMLWriter.java#L205-L222

slawekjaranowski avatar May 17 '24 10:05 slawekjaranowski

Looking into this. Can we make sure that we do not double escape the five protected chars.

michael-o avatar May 17 '24 10:05 michael-o

when we will use MXSerializer we will have a exception ... so special characters should be somehow filtered

https://github.com/codehaus-plexus/plexus-xml/blob/master/src/main/java/org/codehaus/plexus/util/xml/pull/MXSerializer.java#L937-L958

slawekjaranowski avatar May 17 '24 10:05 slawekjaranowski

Looking into this. Can we make sure that we do not double escape the five protected chars.

I used:

 Xpp3DomWriter.write(new PrettyPrintXMLWriter(osw), testsuite, false);

so escaping is turned off in Xpp3DomWriter

slawekjaranowski avatar May 17 '24 10:05 slawekjaranowski

@elharo Should an XML writer escape invalid chars to entities or fail here?

michael-o avatar May 17 '24 10:05 michael-o

Looking into this. Can we make sure that we do not double escape the five protected chars.

I used:

 Xpp3DomWriter.write(new PrettyPrintXMLWriter(osw), testsuite, false);

so escaping is turned off in Xpp3DomWriter

Maybe you should add a comment in the source code to depict this.

michael-o avatar May 17 '24 10:05 michael-o

Am I stupid or where am I supposed to find the output of new Example().printAscii();? I can't find it.

michael-o avatar May 17 '24 10:05 michael-o

Am I stupid or where am I supposed to find the output of new Example().printAscii();? I can't find it.

There is:

public class Example {
  public void printAscii() {
    for (int i = 0; i < Byte.MAX_VALUE; ++i) {
      System.out.println((char) i);
    }
  }

Standard output of test is pass to log file and next to report

slawekjaranowski avatar May 17 '24 11:05 slawekjaranowski

Am I stupid or where am I supposed to find the output of new Example().printAscii();? I can't find it.

There is:

public class Example {
  public void printAscii() {
    for (int i = 0; i < Byte.MAX_VALUE; ++i) {
      System.out.println((char) i);
    }
  }

Standard output of test is pass to log file and next to report

I know, but I don't see that stdout anywhere.

michael-o avatar May 17 '24 11:05 michael-o

As we see root cause is in plexus-xml I know that using escapeXml10 is a workaround here.

So what is your proposition to solve issue in m-invoker-p?

@michael-o @elharo

slawekjaranowski avatar May 17 '24 12:05 slawekjaranowski

As we see root cause is in plexus-xml I know that using escapeXml10 is a workaround here.

So what is your proposition to solve issue in m-invoker-p?

@michael-o @elharo

Regardless of the fix or the flaw in Plexus XML I completely fail to understand the supplied reprocuder in the JIRA issue.

michael-o avatar May 17 '24 13:05 michael-o

As we see root cause is in plexus-xml I know that using escapeXml10 is a workaround here. So what is your proposition to solve issue in m-invoker-p? @michael-o @elharo

Regardless of the fix or the flaw in Plexus XML I completely fail to understand the supplied reprocuder in the JIRA issue.

Ok.

  • we execute m-invoker-p for project src/it/MINVOKER-351
    • m-invoker-p - execute a project src/it/MINVOKER-351/src/it/minvoker-351
      • in src/it/MINVOKER-351/src/it/minvoker-351 we have a unit test which print some special chars to build output
    • m-invoker-p - collect output of src/it/MINVOKER-351/src/it/minvoker-351 and store in buildlog file
    • m-invoker-p - generate a junit report with systemOut element which contains body of buildlog file
  • execute site-plugin with surefire-report to confirm that generated junit reports are ok

slawekjaranowski avatar May 17 '24 14:05 slawekjaranowski

Moving to draft because the IT is poorly designed.

michael-o avatar May 17 '24 18:05 michael-o

Moving to draft because the IT is poorly designed.

I hope you want to redesign proposed IT or you have any other hints how should be done.

slawekjaranowski avatar May 17 '24 18:05 slawekjaranowski

Moving to draft because the IT is poorly designed.

I hope you want to redesign proposed IT or you have any other hints how should be done.

It is rather the conditions the IT comes into play. Please read my analysis on the JIRA issue.

michael-o avatar May 17 '24 18:05 michael-o

@slawekjaranowski Just ran the changed IT with 3.8.8 and 3.9.6. In 3.8.8 I don't see output from 0x00 to 0x0F at all in build.log. Regardless of the escaping you applied. There is some inherent bug here which I prefer to understand first because applying this change.

michael-o avatar May 17 '24 19:05 michael-o

@slawekjaranowski Please ping compiler and surefire plugin in the embedded IT, that's the reason why there is no output.

michael-o avatar May 17 '24 19:05 michael-o

Please also prepend: https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#getName%28int%29 Then we know what we print out.

michael-o avatar May 17 '24 19:05 michael-o

@michael-o IT improved according to hints

slawekjaranowski avatar May 18 '24 12:05 slawekjaranowski

@elharo is right. This is how it should look like:

    public static void main(String[] args) throws ParserConfigurationException, TransformerException {
		Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
		Element root = doc.createElement("root");
		for (int i = 0; i < Byte.MAX_VALUE; i++) {
			Element elem = doc.createElement("char");
			elem.setTextContent(Character.getName(i) + ": " + ((char) i));
			root.appendChild(elem);
		}
		doc.appendChild(root);

		DOMSource domSource = new DOMSource(doc);
		StreamResult result = new StreamResult(System.out);
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer transformer = tf.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        transformer.transform(domSource, result);
	}

output:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
  <char>NULL: &#0;</char>
  <char>START OF HEADING: &#1;</char>
  <char>START OF TEXT: &#2;</char>
  <char>END OF TEXT: &#3;</char>
  <char>END OF TRANSMISSION: &#4;</char>
  <char>ENQUIRY: &#5;</char>
  <char>ACKNOWLEDGE: &#6;</char>
  <char>BELL: &#7;</char>
  <char>BACKSPACE: &#8;</char>
  <char>CHARACTER TABULATION: 	</char>
  <char>LINE FEED (LF): 
</char>
  <char>LINE TABULATION: &#11;</char>
  <char>FORM FEED (FF): &#12;</char>
  <char>CARRIAGE RETURN (CR): &#13;</char>
  <char>SHIFT OUT: &#14;</char>
  <char>SHIFT IN: &#15;</char>
  <char>DATA LINK ESCAPE: &#16;</char>
  <char>DEVICE CONTROL ONE: &#17;</char>
  <char>DEVICE CONTROL TWO: &#18;</char>
  <char>DEVICE CONTROL THREE: &#19;</char>
  <char>DEVICE CONTROL FOUR: &#20;</char>
  <char>NEGATIVE ACKNOWLEDGE: &#21;</char>
  <char>SYNCHRONOUS IDLE: &#22;</char>
  <char>END OF TRANSMISSION BLOCK: &#23;</char>
  <char>CANCEL: &#24;</char>
  <char>END OF MEDIUM: &#25;</char>
  <char>SUBSTITUTE: &#26;</char>
  <char>ESCAPE: &#27;</char>
  <char>INFORMATION SEPARATOR FOUR: &#28;</char>
  <char>INFORMATION SEPARATOR THREE: &#29;</char>
  <char>INFORMATION SEPARATOR TWO: &#30;</char>
  <char>INFORMATION SEPARATOR ONE: &#31;</char>
  <char>SPACE:  </char>
  <char>EXCLAMATION MARK: !</char>
  <char>QUOTATION MARK: "</char>
  <char>NUMBER SIGN: #</char>
  <char>DOLLAR SIGN: $</char>
  <char>PERCENT SIGN: %</char>
  <char>AMPERSAND: &amp;</char>
  <char>APOSTROPHE: '</char>
  <char>LEFT PARENTHESIS: (</char>
  <char>RIGHT PARENTHESIS: )</char>
  <char>ASTERISK: *</char>
  <char>PLUS SIGN: +</char>
  <char>COMMA: ,</char>
  <char>HYPHEN-MINUS: -</char>
  <char>FULL STOP: .</char>
  <char>SOLIDUS: /</char>
  <char>DIGIT ZERO: 0</char>
  <char>DIGIT ONE: 1</char>
  <char>DIGIT TWO: 2</char>
  <char>DIGIT THREE: 3</char>
  <char>DIGIT FOUR: 4</char>
  <char>DIGIT FIVE: 5</char>
  <char>DIGIT SIX: 6</char>
  <char>DIGIT SEVEN: 7</char>
  <char>DIGIT EIGHT: 8</char>
  <char>DIGIT NINE: 9</char>
  <char>COLON: :</char>
  <char>SEMICOLON: ;</char>
  <char>LESS-THAN SIGN: &lt;</char>
  <char>EQUALS SIGN: =</char>
  <char>GREATER-THAN SIGN: &gt;</char>
  <char>QUESTION MARK: ?</char>
  <char>COMMERCIAL AT: @</char>
  <char>LATIN CAPITAL LETTER A: A</char>
  <char>LATIN CAPITAL LETTER B: B</char>
  <char>LATIN CAPITAL LETTER C: C</char>
  <char>LATIN CAPITAL LETTER D: D</char>
  <char>LATIN CAPITAL LETTER E: E</char>
  <char>LATIN CAPITAL LETTER F: F</char>
  <char>LATIN CAPITAL LETTER G: G</char>
  <char>LATIN CAPITAL LETTER H: H</char>
  <char>LATIN CAPITAL LETTER I: I</char>
  <char>LATIN CAPITAL LETTER J: J</char>
  <char>LATIN CAPITAL LETTER K: K</char>
  <char>LATIN CAPITAL LETTER L: L</char>
  <char>LATIN CAPITAL LETTER M: M</char>
  <char>LATIN CAPITAL LETTER N: N</char>
  <char>LATIN CAPITAL LETTER O: O</char>
  <char>LATIN CAPITAL LETTER P: P</char>
  <char>LATIN CAPITAL LETTER Q: Q</char>
  <char>LATIN CAPITAL LETTER R: R</char>
  <char>LATIN CAPITAL LETTER S: S</char>
  <char>LATIN CAPITAL LETTER T: T</char>
  <char>LATIN CAPITAL LETTER U: U</char>
  <char>LATIN CAPITAL LETTER V: V</char>
  <char>LATIN CAPITAL LETTER W: W</char>
  <char>LATIN CAPITAL LETTER X: X</char>
  <char>LATIN CAPITAL LETTER Y: Y</char>
  <char>LATIN CAPITAL LETTER Z: Z</char>
  <char>LEFT SQUARE BRACKET: [</char>
  <char>REVERSE SOLIDUS: \</char>
  <char>RIGHT SQUARE BRACKET: ]</char>
  <char>CIRCUMFLEX ACCENT: ^</char>
  <char>LOW LINE: _</char>
  <char>GRAVE ACCENT: `</char>
  <char>LATIN SMALL LETTER A: a</char>
  <char>LATIN SMALL LETTER B: b</char>
  <char>LATIN SMALL LETTER C: c</char>
  <char>LATIN SMALL LETTER D: d</char>
  <char>LATIN SMALL LETTER E: e</char>
  <char>LATIN SMALL LETTER F: f</char>
  <char>LATIN SMALL LETTER G: g</char>
  <char>LATIN SMALL LETTER H: h</char>
  <char>LATIN SMALL LETTER I: i</char>
  <char>LATIN SMALL LETTER J: j</char>
  <char>LATIN SMALL LETTER K: k</char>
  <char>LATIN SMALL LETTER L: l</char>
  <char>LATIN SMALL LETTER M: m</char>
  <char>LATIN SMALL LETTER N: n</char>
  <char>LATIN SMALL LETTER O: o</char>
  <char>LATIN SMALL LETTER P: p</char>
  <char>LATIN SMALL LETTER Q: q</char>
  <char>LATIN SMALL LETTER R: r</char>
  <char>LATIN SMALL LETTER S: s</char>
  <char>LATIN SMALL LETTER T: t</char>
  <char>LATIN SMALL LETTER U: u</char>
  <char>LATIN SMALL LETTER V: v</char>
  <char>LATIN SMALL LETTER W: w</char>
  <char>LATIN SMALL LETTER X: x</char>
  <char>LATIN SMALL LETTER Y: y</char>
  <char>LATIN SMALL LETTER Z: z</char>
  <char>LEFT CURLY BRACKET: {</char>
  <char>VERTICAL LINE: |</char>
  <char>RIGHT CURLY BRACKET: }</char>
  <char>TILDE: ~</char>
</root>

which it does not with the Plexus serializer. Means: Plexus serializer is broken.

michael-o avatar May 18 '24 19:05 michael-o

Counterpart with Plexus:

        MXSerializer sr = new MXSerializer();
        sr.setOutput(System.out, "UTF-8");
        sr.startDocument(null, Boolean.TRUE);
        sr.startTag(null, "root");
        for (int i = 0; i < Byte.MAX_VALUE; i++) {
        	sr.startTag(null, "char");

			sr.text(Character.getName(i) + ": " + ((char) i));
			sr.endTag(null, "char");
		}

        sr.endTag(null, "root");
        sr.endDocument();

output:

Exception in thread "main" java.lang.IllegalStateException: character 0 is not allowed in output
	at org.codehaus.plexus.util.xml.pull.MXSerializer.writeElementContent(MXSerializer.java:947)
	at org.codehaus.plexus.util.xml.pull.MXSerializer.text(MXSerializer.java:780)
	at org.apache.maven.doxia.siterenderer.DefaultSiteRenderer.main(DefaultSiteRenderer.java:955)

michael-o avatar May 18 '24 19:05 michael-o

With StringEscapeUtils:

<?xml version="1.0" standalone="yes"?><root><char>NULL: </char><char>START OF HEADING: </char><char>START OF TEXT: </char><char>END OF TEXT: </char><char>END OF TRANSMISSION: </char><char>ENQUIRY: </char><char>ACKNOWLEDGE: </char><char>BELL: </char><char>BACKSPACE: </char><char>CHARACTER TABULATION: 	</char><char>LINE FEED (LF): 
</char><char>LINE TABULATION: </char><char>FORM FEED (FF): </char><char>CARRIAGE RETURN (CR): 
</char><char>SHIFT OUT: </char><char>SHIFT IN: </char><char>DATA LINK ESCAPE: </char><char>DEVICE CONTROL ONE: </char><char>DEVICE CONTROL TWO: </char><char>DEVICE CONTROL THREE: </char><char>DEVICE CONTROL FOUR: </char><char>NEGATIVE ACKNOWLEDGE: </char><char>SYNCHRONOUS IDLE: </char><char>END OF TRANSMISSION BLOCK: </char><char>CANCEL: </char><char>END OF MEDIUM: </char><char>SUBSTITUTE: </char><char>ESCAPE: </char><char>INFORMATION SEPARATOR FOUR: </char><char>INFORMATION SEPARATOR THREE: </char><char>INFORMATION SEPARATOR TWO: </char><char>INFORMATION SEPARATOR ONE: </char><char>SPACE:  </char><char>EXCLAMATION MARK: !</char><char>QUOTATION MARK: &amp;quot;</char><char>NUMBER SIGN: #</char><char>DOLLAR SIGN: $</char><char>PERCENT SIGN: %</char><char>AMPERSAND: &amp;amp;</char><char>APOSTROPHE: &amp;apos;</char><char>LEFT PARENTHESIS: (</char><char>RIGHT PARENTHESIS: )</char><char>ASTERISK: *</char><char>PLUS SIGN: +</char><char>COMMA: ,</char><char>HYPHEN-MINUS: -</char><char>FULL STOP: .</char><char>SOLIDUS: /</char><char>DIGIT ZERO: 0</char><char>DIGIT ONE: 1</char><char>DIGIT TWO: 2</char><char>DIGIT THREE: 3</char><char>DIGIT FOUR: 4</char><char>DIGIT FIVE: 5</char><char>DIGIT SIX: 6</char><char>DIGIT SEVEN: 7</char><char>DIGIT EIGHT: 8</char><char>DIGIT NINE: 9</char><char>COLON: :</char><char>SEMICOLON: ;</char><char>LESS-THAN SIGN: &amp;lt;</char><char>EQUALS SIGN: =</char><char>GREATER-THAN SIGN: &amp;gt;</char><char>QUESTION MARK: ?</char><char>COMMERCIAL AT: @</char><char>LATIN CAPITAL LETTER A: A</char><char>LATIN CAPITAL LETTER B: B</char><char>LATIN CAPITAL LETTER C: C</char><char>LATIN CAPITAL LETTER D: D</char><char>LATIN CAPITAL LETTER E: E</char><char>LATIN CAPITAL LETTER F: F</char><char>LATIN CAPITAL LETTER G: G</char><char>LATIN CAPITAL LETTER H: H</char><char>LATIN CAPITAL LETTER I: I</char><char>LATIN CAPITAL LETTER J: J</char><char>LATIN CAPITAL LETTER K: K</char><char>LATIN CAPITAL LETTER L: L</char><char>LATIN CAPITAL LETTER M: M</char><char>LATIN CAPITAL LETTER N: N</char><char>LATIN CAPITAL LETTER O: O</char><char>LATIN CAPITAL LETTER P: P</char><char>LATIN CAPITAL LETTER Q: Q</char><char>LATIN CAPITAL LETTER R: R</char><char>LATIN CAPITAL LETTER S: S</char><char>LATIN CAPITAL LETTER T: T</char><char>LATIN CAPITAL LETTER U: U</char><char>LATIN CAPITAL LETTER V: V</char><char>LATIN CAPITAL LETTER W: W</char><char>LATIN CAPITAL LETTER X: X</char><char>LATIN CAPITAL LETTER Y: Y</char><char>LATIN CAPITAL LETTER Z: Z</char><char>LEFT SQUARE BRACKET: [</char><char>REVERSE SOLIDUS: \</char><char>RIGHT SQUARE BRACKET: ]</char><char>CIRCUMFLEX ACCENT: ^</char><char>LOW LINE: _</char><char>GRAVE ACCENT: `</char><char>LATIN SMALL LETTER A: a</char><char>LATIN SMALL LETTER B: b</char><char>LATIN SMALL LETTER C: c</char><char>LATIN SMALL LETTER D: d</char><char>LATIN SMALL LETTER E: e</char><char>LATIN SMALL LETTER F: f</char><char>LATIN SMALL LETTER G: g</char><char>LATIN SMALL LETTER H: h</char><char>LATIN SMALL LETTER I: i</char><char>LATIN SMALL LETTER J: j</char><char>LATIN SMALL LETTER K: k</char><char>LATIN SMALL LETTER L: l</char><char>LATIN SMALL LETTER M: m</char><char>LATIN SMALL LETTER N: n</char><char>LATIN SMALL LETTER O: o</char><char>LATIN SMALL LETTER P: p</char><char>LATIN SMALL LETTER Q: q</char><char>LATIN SMALL LETTER R: r</char><char>LATIN SMALL LETTER S: s</char><char>LATIN SMALL LETTER T: t</char><char>LATIN SMALL LETTER U: u</char><char>LATIN SMALL LETTER V: v</char><char>LATIN SMALL LETTER W: w</char><char>LATIN SMALL LETTER X: x</char><char>LATIN SMALL LETTER Y: y</char><char>LATIN SMALL LETTER Z: z</char><char>LEFT CURLY BRACKET: {</char><char>VERTICAL LINE: |</char><char>RIGHT CURLY BRACKET: }</char><char>TILDE: ~</char></root>

I believe they are broken as well.

michael-o avatar May 18 '24 19:05 michael-o

Naive upstream solution: https://github.com/codehaus-plexus/plexus-xml/pull/28

michael-o avatar May 18 '24 20:05 michael-o

I have now tested the IT with the patched Plexus XML. Output looks fine now:

[INFO] Running example.minvoker351.ExampleTest
&#0; - name: NULL
&#1; - name: START OF HEADING
&#2; - name: START OF TEXT
&#3; - name: END OF TEXT
&#4; - name: END OF TRANSMISSION
&#5; - name: ENQUIRY
&#6; - name: ACKNOWLEDGE
&#7; - name: BELL
&#8; - name: BACKSPACE
	 - name: CHARACTER TABULATION

 - name: LINE FEED (LF)
&#11; - name: LINE TABULATION
&#12; - name: FORM FEED (FF)

 - name: CARRIAGE RETURN (CR)
&#14; - name: SHIFT OUT
&#15; - name: SHIFT IN
&#16; - name: DATA LINK ESCAPE
&#17; - name: DEVICE CONTROL ONE
&#18; - name: DEVICE CONTROL TWO
&#19; - name: DEVICE CONTROL THREE
&#20; - name: DEVICE CONTROL FOUR
&#21; - name: NEGATIVE ACKNOWLEDGE
&#22; - name: SYNCHRONOUS IDLE
&#23; - name: END OF TRANSMISSION BLOCK
&#24; - name: CANCEL
&#25; - name: END OF MEDIUM
&#26; - name: SUBSTITUTE
&#27; - name: ESCAPE
&#28; - name: INFORMATION SEPARATOR FOUR
&#29; - name: INFORMATION SEPARATOR THREE
&#30; - name: INFORMATION SEPARATOR TWO
&#31; - name: INFORMATION SEPARATOR ONE
  - name: SPACE
! - name: EXCLAMATION MARK
&quot; - name: QUOTATION MARK
# - name: NUMBER SIGN
$ - name: DOLLAR SIGN
% - name: PERCENT SIGN
&amp; - name: AMPERSAND
&apos; - name: APOSTROPHE
( - name: LEFT PARENTHESIS
) - name: RIGHT PARENTHESIS
* - name: ASTERISK
+ - name: PLUS SIGN
, - name: COMMA
- - name: HYPHEN-MINUS
. - name: FULL STOP
/ - name: SOLIDUS
0 - name: DIGIT ZERO
1 - name: DIGIT ONE
2 - name: DIGIT TWO
3 - name: DIGIT THREE
4 - name: DIGIT FOUR
5 - name: DIGIT FIVE
6 - name: DIGIT SIX
7 - name: DIGIT SEVEN
8 - name: DIGIT EIGHT
9 - name: DIGIT NINE
: - name: COLON
; - name: SEMICOLON
&lt; - name: LESS-THAN SIGN
= - name: EQUALS SIGN
&gt; - name: GREATER-THAN SIGN
? - name: QUESTION MARK
@ - name: COMMERCIAL AT
A - name: LATIN CAPITAL LETTER A
B - name: LATIN CAPITAL LETTER B
C - name: LATIN CAPITAL LETTER C
D - name: LATIN CAPITAL LETTER D
E - name: LATIN CAPITAL LETTER E
F - name: LATIN CAPITAL LETTER F
G - name: LATIN CAPITAL LETTER G
H - name: LATIN CAPITAL LETTER H
I - name: LATIN CAPITAL LETTER I
J - name: LATIN CAPITAL LETTER J
K - name: LATIN CAPITAL LETTER K
L - name: LATIN CAPITAL LETTER L
M - name: LATIN CAPITAL LETTER M
N - name: LATIN CAPITAL LETTER N
O - name: LATIN CAPITAL LETTER O
P - name: LATIN CAPITAL LETTER P
Q - name: LATIN CAPITAL LETTER Q
R - name: LATIN CAPITAL LETTER R
S - name: LATIN CAPITAL LETTER S
T - name: LATIN CAPITAL LETTER T
U - name: LATIN CAPITAL LETTER U
V - name: LATIN CAPITAL LETTER V
W - name: LATIN CAPITAL LETTER W
X - name: LATIN CAPITAL LETTER X
Y - name: LATIN CAPITAL LETTER Y
Z - name: LATIN CAPITAL LETTER Z
[ - name: LEFT SQUARE BRACKET
\ - name: REVERSE SOLIDUS
] - name: RIGHT SQUARE BRACKET
^ - name: CIRCUMFLEX ACCENT
_ - name: LOW LINE
` - name: GRAVE ACCENT
a - name: LATIN SMALL LETTER A
b - name: LATIN SMALL LETTER B
c - name: LATIN SMALL LETTER C
d - name: LATIN SMALL LETTER D
e - name: LATIN SMALL LETTER E
f - name: LATIN SMALL LETTER F
g - name: LATIN SMALL LETTER G
h - name: LATIN SMALL LETTER H
i - name: LATIN SMALL LETTER I
j - name: LATIN SMALL LETTER J
k - name: LATIN SMALL LETTER K
l - name: LATIN SMALL LETTER L
m - name: LATIN SMALL LETTER M
n - name: LATIN SMALL LETTER N
o - name: LATIN SMALL LETTER O
p - name: LATIN SMALL LETTER P
q - name: LATIN SMALL LETTER Q
r - name: LATIN SMALL LETTER R
s - name: LATIN SMALL LETTER S
t - name: LATIN SMALL LETTER T
u - name: LATIN SMALL LETTER U
v - name: LATIN SMALL LETTER V
w - name: LATIN SMALL LETTER W
x - name: LATIN SMALL LETTER X
y - name: LATIN SMALL LETTER Y
z - name: LATIN SMALL LETTER Z
{ - name: LEFT CURLY BRACKET
| - name: VERTICAL LINE
} - name: RIGHT CURLY BRACKET
~ - name: TILDE
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.080 s -- in example.minvoker351.ExampleTest

but the reader chokes now:

Caused by: org.xml.sax.SAXParseException: Character reference "&#
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException (ErrorHandlerWrapper.java:204)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError (ErrorHandlerWrapper.java:178)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError (XMLErrorReporter.java:399)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError (XMLErrorReporter.java:326)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError (XMLScanner.java:1466)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanCharReferenceValue (XMLScanner.java:1339)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next (XMLDocumentFragmentScannerImpl.java:3052)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next (XMLDocumentScannerImpl.java:601)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument (XMLDocumentFragmentScannerImpl.java:504)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse (XML11Configuration.java:841)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse (XML11Configuration.java:770)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse (XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse (AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse (SAXParserImpl.java:642)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse (SAXParserImpl.java:326)
    at org.apache.maven.plugins.surefire.report.TestSuiteXmlParser.parse (TestSuiteXmlParser.java:91)

See: https://stackoverflow.com/questions/55335528/fatal-error-character-reference-org-xml-sax-saxparseexception

michael-o avatar May 18 '24 20:05 michael-o

And the value is indeed correct:

        Document doc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(w.toString().getBytes(StandardCharsets.UTF_8)));
        NodeList childNodes = doc2.getDocumentElement().getChildNodes();
        for (int i = 0; i < childNodes.getLength(); i++) {
        	System.out.println(childNodes.item(i).getTextContent());
        }

output: [Fatal Error] :3:19: Zeichenreferenz "&#

Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 19; Zeichenreferenz "&#
	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:338)
	at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
	at org.apache.maven.doxia.siterenderer.DefaultSiteRenderer.main(DefaultSiteRenderer.java:954)

While I don't understand that those values are serialized, but cannot be deserialized the best would be to drop the serialization of them silently in Plexus XML.

michael-o avatar May 18 '24 20:05 michael-o