dangerzone icon indicating copy to clipboard operation
dangerzone copied to clipboard

Run our large test suite in GitHub Actions

Open apyrgio opened this issue 1 year ago • 3 comments

What is the feature you think should be a good addition to Dangerzone?

Dangerzone has a test suite with a large number of documents, that we tend to run before each release. We usually run this locally, but it takes quite a lot of time and space. It would be best to run it in GitHub Actions. The test suite is documented here: https://github.com/freedomofpress/dangerzone/blob/main/docs/developer/TESTING.md

Is your feature request related to a problem? Please describe.

This test suite is useful for detecting regressions in our code, so it's good to run it in regular intervals.

Additional context

Anecdotally, running this test suite locally takes about a day. On the other hand, GitHub's Usage Limits state that jobs can run for a maximum of 6 hours:

Job execution time - Each job in a workflow can run for up to 6 hours of execution time. If a job reaches this limit, the job is terminated and fails to complete.

Luckily, GitHub allows us to run 20 concurrent jobs in our free plan. There are people who have taken advantage of this fact to parallelize their pytest invocations: https://github.com/jerry-git/pytest-split-gh-actions-demo/blob/master/.github/workflows/test.yml

See also:

  • https://github.com/jerry-git/pytest-split
  • https://github.com/mark-adams/pytest-test-groups

apyrgio avatar Oct 23 '24 12:10 apyrgio

As part of releasing 0.9.0, we have to run our large test suite. I want to timebox this particular task (and #657), to see if it's possible to run our large test suite in GitHub Actions with as few modifications as possible. I'll allocate one day to this task, and if I can't make it, I'll report back.

apyrgio avatar Mar 05 '25 11:03 apyrgio

We now have a way to run our large test suite in GitHub actions, in ~40 minutes total (example run), by sharding our Python tests in 20 groups and running them concurrently across 20 GitHub Actions runners :rocket:.

Previously, running this test suite required about a day, some debugging to install the proper packages, and very rarely ran to completion. Also, it would hog the machine it was running in for the test period.

Some extra takeaways from this endeavor:

  1. Running our large tests against the main branch reports 84 failures in 7819 docs (~1% failure rate). This failure rate is consistent with previous runs (see 0.6.0 results).

  2. The reported error reasons look relatively benign to me, given that our test set contains corrupted/encrypted office docs and PDFs:

    All failures:
      195 Converting page X/X to pixels
       35 Conversion to PDF with LibreOffice failed
       24 document closed or encrypted
       10 bad start page number
       10 Installing LibreOffice extension 'hXorestart.oxt'
        6 The document format is not supported
        3 
        2 kill container: No such process
        2 cannot find page X in page tree
        1 image width is zero (or less)
        1 Corrupt JPEG data: premature end of data segment
    
  3. We have identified a document (ofz21385-1.doc) which can make LibreOffice hang, and by extension Dangerzone. Due to the removal of timeouts, we are aware that such hangs may occur, and as such they are not a regression. Note that we track this limitation in issue #878.

  4. The combined time of the whole test suite was 26,937 seconds, which is roughly 7.5 hours. This shows that the removal of the extra uncompressed PDFs and the OCR step played an important role.

  5. Regarding pytest-split vs pytest-test-groups:

    • pytest-split has more bells and whistles, but assumes you have run the tests beforehand and jotted down their durations. This is impractical in our case, I'm afraid, because we hadn't managed yet to run the test to completion :sweat_smile:. You can run the tests without storing the durations, but then you can't load balance the tests effectively
    • pytest-test-groups on the other hand is much simpler, and can pick a random sample of the tests. A random sample is ok for now, so I moved forward with that.

apyrgio avatar Mar 12 '25 22:03 apyrgio

Just updated the milestone to 0.11.0 as we don't need to block on this for the current release.

It's worth mentioning that having it as a PR here make it possible to run whenever we want, while waiting for it to be merged.

almet avatar Oct 06 '25 12:10 almet