Run our large tests in parallel in GitHub actions
Our large tests where always hard to run on our own infra, because they require too much time, and too much space. Thankfully, GitHub actions allows us to run 20 concurrent jobs, which we can use it to our advantage. In this PR we:
- Add the
pytest-test-groupsdependency, which can split a large pytest run into groups, and sample them with a pre-determined seed. - Use that dependency in our
Makefile, and define some envvars that control the number of groups, the specific group to run, and the random seed. - Add a new CI job in
large-tests.yml, which installs the necessary requirements, builds the container image (if not built already), and runs each group in a separate Ubuntu runner
Closes #969
I have rebased this PR on top of test/main-icu, since our container image and our LibreOffice version has significantly changed since the last time this test ran. Here are the reports from the 0.9.0 run, and the ones from the 0.10.0 one:
(note, I couldn't upload the original result.xml files because they are over the file size limit (51MB > 25MB))
Besides the change in the container image, this new run contains ~400 more documents, because Dangerzone no longer hangs when processing group 13. Here is the difference between the new run and the previous one:
--- /home/apyrgio/Downloads/large-tests-report-0.9.0.txt 2025-10-22 18:03:37.968353083 +0300
+++ /home/apyrgio/Downloads/large-tests-report-0.10.0.txt 2025-10-22 18:04:23.215343422 +0300
@@ -1,10 +1,10 @@
==== RESULTS SUMMARY ===
errors: 0
- failures: 84
+ failures: 92
skipped: 0
- tests: 7819
- failure rate: 0.010743061772605193
+ tests: 8230
+ failure rate: 0.01117861482381531
=== TEST OVERVIEW ===
@@ -32,51 +32,127 @@
=== MOST COMMON CONTAINER OUTPUT ===
Top 30:
- 7737
- 2893 convert /tmp/input_file -> /tmp/input_file.pdf using filter : writer_pdf_Export
- 1236 convert /tmp/input_file -> /tmp/input_file.pdf using filter : calc_pdf_Export
- 693 convert /tmp/input_file -> /tmp/input_file.pdf using filter : impress_pdf_Export
- 139 Installing LibreOffice extension 'hXorestart.oxt'
- 129 Archive: /opt/libreoffice_ext/hXorestart.oxt
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/TypeDetection.xcu
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/HXOrestart_types.xcu
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/HXOrestart_filters.xcu
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/logger.properties
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/icon/HXOrestart.png
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description/desc_ko.txt
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description/desc_en.txt
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description.xml
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/META-INF/manifest.xml
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/HXOrestart.jar
- 46 convert /tmp/input_file -> /tmp/input_file.pdf using filter : draw_pdf_Export
- 35 Conversion to PDF with LibreOffice failed
- 24 document closed or encrypted
- 10 libpng warning: iCCP: known incorrect sRGB profile
- 10 bad start page number
- 9 Corrupt JPEG data: premature end of data segment
- 6 The document format is not supported
- 2 libpng warning: sBIT: invalid
- 2 kill container: No such process
- 2 cannot find page X in page tree
- 1 time="X-X-XTX:X:XZ" level=error msg="forwarding signal X to container ffeXeebdXcccXeXadXdfbfeXeXfdXdaXeXcXcXcaXfX: container has already been removed"
- 1 time="X-X-XTX:X:XZ" level=error msg="forwarding signal X to container fdeXeXddaXaXdcffXfXeXeXafdXdXcdXcXfX: container has already been removed"
- 1 time="X-X-XTX:X:XZ" level=error msg="forwarding signal X to container fdaXbXbaXdaXeXcXdaXfXeXfXdeeXfXffXcXdbX: container has already been removed"
- 1 time="X-X-XTX:X:XZ" level=error msg="forwarding signal X to container fdXfXaXeXaXcXacbfXfXdXeXfXaXcdXbXbfXebXfaXdbXdaX: no container with ID fdXfXaXeXaXcXacbfXfXdXeXfXaXcdXbXbfXebXfaXdbXdaX found in database: no such container"
+ 10099
+ 3055 convert /tmp/input_file as a Writer document -> /tmp/input_file.pdf using filter : writer_pdf_Export
+ 1303 convert /tmp/input_file as a Calc document -> /tmp/input_file.pdf using filter : calc_pdf_Export
+ 735 convert /tmp/input_file as a Impress document -> /tmp/input_file.pdf using filter : impress_pdf_Export
+ 553 MuPDF error: syntax error: syntax error in content stream
+ 240 MuPDF error: unsupported error: cannot create appearance stream for BCLC_PDmorFZone annotations
+ 143 Installing LibreOffice extension 'hXorestart.oxt'
+ 133 Archive: /opt/libreoffice_ext/hXorestart.oxt
+ 133 extracting: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/mimetype
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/types.rdb
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/TypeDetection.xcu
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/HXOrestart_types.xcu
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/HXOrestart_filters.xcu
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/logger.properties
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/icon/HXOrestart.png
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description/desc_ko.txt
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description/desc_en.txt
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description.xml
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/META-INF/manifest.xml
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/HXOrestart.jar
+ 103 MuPDF error: syntax error: syntax error in array
+ 93 MuPDF error: syntax error: invalid key in dict
+ 68 MuPDF error: syntax error: unknown keyword: 'X.X.X'
+ 48 convert /tmp/input_file as a Draw document -> /tmp/input_file.pdf using filter : draw_pdf_Export
+ 41 Conversion to PDF with LibreOffice failed
+ 34 MuPDF error: unsupported error: cannot create appearance stream for Screen annotations
+ 32 MuPDF error: syntax error: cannot find ExtGState resource ''
+ 29 MuPDF error: syntax error: cannot find ExtGState resource 'RX'
+ 26 MuPDF error: syntax error: unknown keyword: '-XeX'
+ 25 document closed or encrypted
=== FAILURE REASONS ===
All failures:
- 195 Converting page X/X to pixels
- 35 Conversion to PDF with LibreOffice failed
- 24 document closed or encrypted
- 10 bad start page number
+ 196 Converting page X/X to pixels
+ 185
+ 64 MuPDF error: syntax error: syntax error in content stream
+ 41 Conversion to PDF with LibreOffice failed
+ 25 document closed or encrypted
+ 21 MuPDF error: format error: corrupt object stream (X X R)
+ 16 Traceback (most recent call last):
+ 16 Exception was from C++/Python callback:
+ 16 ========
+ 16 virtual void SwigDirector_DiagnosticCallback::_print(const char*)
+ 16 <frozen runpy>:X:_run_module_as_main():
+ 16 <frozen runpy>:X:_run_code():
+ 16 /usr/lib/pythonX/dist-packages/pymupdf/utils.py:X:get_pixmap(): dl = page.get_displaylist(annots=annots)
+ 16 /usr/lib/pythonX/dist-packages/pymupdf/__init__.py:X:message(): print(text, file=_g_out_message, flush=X)
+ 16 /usr/lib/pythonX/dist-packages/pymupdf/__init__.py:X:get_displaylist(): dl = mupdf.fz_new_display_list_from_page(self.this)
+ 16 /usr/lib/pythonX/dist-packages/pymupdf/__init__.py:X:JM_mupdf_error(): message(f'MuPDF error: {text}\n')
+ 16 /usr/lib/pythonX/dist-packages/mupdf/__init__.py:X:fz_new_display_list_from_page(): return _mupdf.fz_new_display_list_from_page(page)
+ 16 /usr/lib/pythonX/dist-packages/mupdf/__init__.py:X:_print(): self.printfn( message)
+ 16 /usr/lib/pythonX/dist-packages/mupdf/__init__.py:X:_print(): raise
+ 16 /usr/lib/pythonX.X/asyncio/runners.py:X:run(): return self._loop.run_until_complete(task)
+ 16 /usr/lib/pythonX.X/asyncio/runners.py:X:run(): return runner.run(main)
+ 16 /usr/lib/pythonX.X/asyncio/events.py:X:_run(): self._context.run(self._callback, *self._args)
+ 16 /usr/lib/pythonX.X/asyncio/base_events.py:X:run_until_complete(): self.run_forever()
+ 16 /usr/lib/pythonX.X/asyncio/base_events.py:X:run_forever(): self._run_once()
+ 16 /usr/lib/pythonX.X/asyncio/base_events.py:X:_run_once(): handle._run()
+ 16 /opt/dangerzone/dangerzone/conversion/doc_to_pixels.py:X:main(): await converter.convert()
+ 16 /opt/dangerzone/dangerzone/conversion/doc_to_pixels.py:X:convert(): pix = page.get_pixmap(dpi=DEFAULT_DPI)
+ 16 /opt/dangerzone/dangerzone/conversion/doc_to_pixels.py:X:<module>(): sys.exit(asyncio.run(main()))
+ 10 MuPDF error: syntax error: syntax error in array
10 Installing LibreOffice extension 'hXorestart.oxt'
+ 8 Director exception handler, message is:
6 The document format is not supported
- 3
+ 3 MuPDF error: syntax error: unknown keyword: 'X.X.X'
+ 3 MuPDF error: syntax error: unknown keyword: '.X.X'
+ 3 MuPDF error: syntax error: invalid key in dict
+ 3 MuPDF error: syntax error: cannot find ExtGState resource 'RX'
+ 3 MuPDF error: format error: cannot find object in xref (X X R)
+ 3 Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udccd' in position X: surrogates not allowed
+ 3 DiagnosticCallback[error]::s_print() ignoring exception from _print(): Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udccd' in position X: surrogates not allowed
+ 3 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: '-X.X\udccd'\n", X, X, 'surrogates not allowed')
2 kill container: No such process
- 2 cannot find page X in page tree
- 1 image width is zero (or less)
- 1 Corrupt JPEG data: premature end of data segment
+ 2 code=X: cannot find page X in page tree
+ 2 MuPDF error: syntax error: unknown keyword: 'jX'
+ 2 MuPDF error: syntax error: unknown keyword: 'Xd'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.Xa'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.X.X.X'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.X.'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.X'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.'
+ 2 MuPDF error: syntax error: unknown keyword: '-XsTd'
+ 2 MuPDF error: syntax error: unknown keyword: '-XP.'
+ 2 MuPDF error: library error: zlib error: invalid distance too far back
+ 2 MuPDF error: format error: non-page object in page tree
+ 2 Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udcfa' in position X: surrogates not allowed
+ 2 Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udceX' in position X: surrogates not allowed
+ 2 DiagnosticCallback[error]::s_print() ignoring exception from _print(): Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udcfa' in position X: surrogates not allowed
+ 2 DiagnosticCallback[error]::s_print() ignoring exception from _print(): Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udceX' in position X: surrogates not allowed
+ 1 MuPDF error: syntax error: unknown keyword: 'Xu.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'Xr'
+ 1 MuPDF error: syntax error: unknown keyword: 'XmX.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'Xm'
+ 1 MuPDF error: syntax error: unknown keyword: 'Xf'
+ 1 MuPDF error: syntax error: unknown keyword: 'XTX.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'XP.'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.XiX.'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.Xi'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.XhX'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.XTJ'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.X.Xo'
+ 1 MuPDF error: syntax error: unknown keyword: 'X..X.Xo'
+ 1 MuPDF error: syntax error: unknown keyword: 'X..X.X.X.X.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'T.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'SX'
+ 1 MuPDF error: syntax error: unknown keyword: 'ETX'
+ 1 MuPDF error: syntax error: unknown keyword: '.X.'
+ 1 MuPDF error: syntax error: unknown keyword: '-XnX'
+ 1 MuPDF error: syntax error: unknown keyword: '-XA'
+ 1 MuPDF error: syntax error: unknown keyword: '-X.XsTd'
+ 1 MuPDF error: syntax error: unknown keyword: '-X.Xf*'
+ 1 MuPDF error: syntax error: unknown keyword: '-X..X'
+ 1 Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udcX' in position X: surrogates not allowed
+ 1 DiagnosticCallback[error]::s_print() ignoring exception from _print(): Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udcX' in position X: surrogates not allowed
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: 'Xa\udcfaX.X'\n", X, X, 'surrogates not allowed')
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: 'X.X\udceX'\n", X, X, 'surrogates not allowed')
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: 'X.X\udcX'\n", X, X, 'surrogates not allowed')
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: '-X.Xa\udcfaX.X'\n", X, X, 'surrogates not allowed')
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: '-X.X.X\udceX.X'\n", X, X, 'surrogates not allowed')
=== TIMEOUTS ===
With the goal of not letting this PR open for too long, I'm proposing we merge it as-is, as it's still better than having to maintain the branch on the long run. We can open issues about it if we want.
The standing comments are small adjustments, shouldn't stand in the way of merging this, IMO.