Strings are not properly escaped in JUnit XML reports
Reproduction code:
test_reproducer.lua
local lu = require('luaunit')
function test_str_compare_null_byte()
local actual = "q\000\000\002w\000"
local expected = "q\000\000\002w\000\000"
lu.assertEquals(actual, expected)
end
os.exit( lu.LuaUnit.run() )
$ lua test_reproducer.lua --output junit --name report | cat --show-nonprinting
# XML output to report.xml
# Started on 07/25/24 16:34:54
# Starting test: test_str_compare_null_byte
# Failure: test_reproducer.lua:7: expected: "q^@^@^Bw^@^@"
# actual: "q^@^@^Bw^@"
# Ran 1 tests in 0.002 seconds, 0 successes, 1 failure
The problem is that the JUnit XML reports will also (like the console output) contain these characters unescaped, resulting in invalid XML that the XML parsers I've tried refuse to read:
$ cat --show-nonprinting report.xml
<?xml version="1.0" encoding="UTF-8" ?>
<testsuites>
<testsuite name="LuaUnit" id="00001" package="" hostname="localhost" tests="1" timestamp="2024-07-25T16:36:05" time="0.003" errors="0" failures="1" skipped="0">
<properties>
<property name="Lua Version" value="Lua 5.3"/>
<property name="LuaUnit Version" value="3.4"/>
</properties>
<testcase classname="[TestFunctions]" name="test_str_compare_null_byte" time="0.002">
<failure type="test_reproducer.lua:7: expected: "q^@^@^Bw^@^@"
actual: "q^@^@^Bw^@"">
<![CDATA[stack traceback:
test_reproducer.lua:7: in function 'test_str_compare_null_byte']]></failure>
</testcase>
<system-out/>
<system-err/>
</testsuite>
</testsuites>
I tried:
-
Ruby
parse_xml.rbrequire 'rexml/document' REXML::Document.new(File.read('report.xml'))Console output
$ ruby parse_xml.rb C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:96:in `rescue in parse': #<RuntimeError: Illegal character "\u0000" in raw string "test_reproducer.lua:7: expected: "q\u0000\u0000\u0002w\u0000\u0000"\nactual: "q\u0000\u0000\u0002w\u0000""> (REXML::ParseException) C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:140:in `block in check' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `each' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `check' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/attribute.rb:175:in `element=' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/element.rb:2384:in `[]=' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:36:in `block in parse' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `each' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `parse' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize' parse_xml.rb:3:in `new' parse_xml.rb:3:in `<main>' ... Illegal character "\u0000" in raw string "test_reproducer.lua:7: expected: "q\u0000\u0000\u0002w\u0000\u0000"\nactual: "q\u0000\u0000\u0002w\u0000"" Line: 10 Position: 581 Last 80 unconsumed characters: from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:21:in `parse' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize' from parse_xml.rb:3:in `new' from parse_xml.rb:3:in `<main>' C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:140:in `block in check': Illegal character "\u0000" in raw string "test_reproducer.lua:7: expected: "q\u0000\u0000\u0002w\u0000\u0000"\nactual: "q\u0000\u0000\u0002w\u0000"" (RuntimeError) from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `each' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `check' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/attribute.rb:175:in `element=' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/element.rb:2384:in `[]=' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:36:in `block in parse' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `each' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `parse' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize' from parse_xml.rb:3:in `new' from parse_xml.rb:3:in `<main>' -
Python:
pip install defusedxmlparse_xml.pyfrom defusedxml.ElementTree import parse et = parse('report.xml')Console output
$ python parse_xml.py Traceback (most recent call last): File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1706, in feed self.parser.Parse(data, False) xml.parsers.expat.ExpatError: not well-formed (invalid token): line 9, column 67 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\temp\ks-experiments\luaunit-bug\parse_xml.py", line 2, in <module> et = parse('report.xml') ^^^^^^^^^^^^^^^^^^^ File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\site-packages\defusedxml\common.py", line 100, in parse return _parse(source, parser) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1204, in parse tree.parse(source, parser) File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 572, in parse parser.feed(data) File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1708, in feed self._raiseerror(v) File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1615, in _raiseerror raise err xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 9, column 67 -
xmllint(manpage, available on Ubuntu in the libxml2-utils package)Console output
$ xmllint report.xml report.xml:9: parser error : Char 0x0 out of allowed range <failure type="test_reproducer.lua:7: expected: "q ^ report.xml:9: parser error : AttValue: ' expected <failure type="test_reproducer.lua:7: expected: "q ^ report.xml:9: parser error : attributes construct error <failure type="test_reproducer.lua:7: expected: "q ^ report.xml:9: parser error : Couldn't find end of Start Tag failure line 9 <failure type="test_reproducer.lua:7: expected: "q ^ report.xml:9: parser error : Premature end of data in tag testcase line 8 <failure type="test_reproducer.lua:7: expected: "q ^
As you can see, all of these reject the XML report with some kind of error, indicating that it is not well-formed XML.
You might want to take a look at, for example, https://github.com/xmlrunner/unittest-xml-reporting from the Python ecosystem to see how it handles this situation:
import unittest
class TestRepro(unittest.TestCase):
def test_str_compare_null_byte(self):
actual = "q\u0000\u0000\u0002w\u0000"
expected = "q\u0000\u0000\u0002w\u0000\u0000"
self.assertEqual(actual, expected)
if __name__ == '__main__':
unittest.main()
Make sure to pip install unittest-xml-reporting first (I'm using the latest version 3.2.0):
$ python -m xmlrunner --outsuffix '' 2>&1 | cat --show-nonprinting
Running tests...
----------------------------------------------------------------------
F
======================================================================
FAIL [0.001s]: test_str_compare_null_byte (test_repro.TestRepro.test_str_compare_null_byte)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\temp\ks-experiments\luaunit-bug\test_repro.py", line 7, in test_str_compare_null_byte
self.assertEqual(actual, expected)
AssertionError: 'q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'
- q^@^@^Bw^@
+ q^@^@^Bw^@^@
? +
----------------------------------------------------------------------
Ran 1 test in 0.000s
FAILED (failures=1)
Generating XML reports...
TEST-test_repro.TestRepro.xml
<?xml version="1.0" encoding="UTF-8"?>
<testsuite name="test_repro.TestRepro" tests="1" file="test_repro.py" time="0.001" timestamp="2024-07-25T17:03:59" failures="1" errors="0" skipped="0">
<testcase classname="test_repro.TestRepro" name="test_str_compare_null_byte" time="0.001" timestamp="2024-07-25T17:03:59" file="test_repro.py" line="4">
<failure type="AssertionError" message="'q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'
- qw
+ qw
? +
"><![CDATA[Traceback (most recent call last):
File "C:\temp\ks-experiments\luaunit-bug\test_repro.py", line 7, in test_str_compare_null_byte
self.assertEqual(actual, expected)
AssertionError: 'q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'
- qw
+ qw
? +
]]></failure>
</testcase>
</testsuite>
Note that the assertion failure is escaped ('q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'), which means that 1. the representation uses only basic ASCII characters, which doesn't cause any problems in the XML report or elsewhere, 2. the full contents of each string is captured, so it's always meaningful for debugging.
It also tries to display some kind of vertical diff with that - qw and + qw, but in this case it turns out to be useless, because all the non-basic-ASCII characters were filtered out from these. But that's still better than outputting them in the XML (which would make the XML invalid) and the real contents of both strings is already clear from the escaped form, so it doesn't matter.
I checked that the generated TEST-test_repro.TestRepro.xml only contains basic ASCII characters as follows:
$ xxd -ps -c 1 TEST-test_repro.TestRepro.xml | sort -u
09
0a
20
(...)
79
(the (...) mark indicates the part I've omitted, otherwise the listing would be unnecessarily long)
09 is horizontal tab (often denoted \t), 0a in hex is line feed (often denoted \n) and everything between 20 and 7e (inclusive) are printable characters (see https://en.wikipedia.org/wiki/ASCII#Printable_characters), so there's nothing problematic.