acme-companion CI : default_cert test fails way too often / randomly on GitHub Actions

trafficstars

The default_cert test is failing so often and so randomly on GitHub Actions that I had to remove it, as GHA does not really have a "allow failure" like Travis or other CI systems, nor a mechanism to restart single tests, which mean I end up restarting the whole tests run sometimes 6 or 8 times in a row.

The test appears to be fine on local but I haven't used local tests very often lastly so I'm not 100% sure.

I tried to double the timeout before the test fails (60 to 120 seconds) to no avail, and I doubt this was a timeout issue to begin with. It might very well be an issue with the feature itself.

If anyone is willing to investigate this, help would be appreciated.

Apr 26 '21 15:04 buchdag

GHA does not really have a "allow failure" like Travis or other CI systems

Yes it does, see how I've done so here. Use continue-on-error: true. Another approach is handled in a separate workflow, that always ensures a step is run with if: ${{ always() }}.

nor a mechanism to restart single tests, which mean I end up restarting the whole tests run sometimes 6 or 8 times in a row.

I believe that's possible, but it's not something I've tried myself. If you can return output about the failure and what test needs to be run again, I think that can be used to trigger a re-run with the returned failures as new input for the test to only cover.

Probably similar to how I've got a workflow split into two workflows (build and deploy for PR doc previews), the 2nd part is only triggered when the 1st part has completed successfully. Note the job condition: if: ${{ github.event.workflow_run.event == 'pull_request' && github.event.workflow_run.conclusion == 'success' }}, while the 1st part of the split workflow also ensures stale runs are canceled (new commit pushed for a PR running a preview docs build).

If anyone is willing to investigate this, help would be appreciated.

I don't have time myself to contribute towards that, but I can say that I've found using bats to be pretty great for running shell script based tests. I'm slowly refactoring our test-suite, but a good example that I recently covered was our test for DH params.

We have a variety of helper functions that you're welcome to use :)

Sep 23 '21 23:09 polarathene

Thanks for the tips @polarathene, I'll look into all of this 😃

Sep 30 '21 16:09 buchdag

While working on a PR for nginx-proxy, I noticed they migrated away from bats test suite to one with python. Since you're maintaining both from what I understand, perhaps that'd make sense to adopt here (if you or any other maintainer ever does find the time to rewrite/port the tests).

Oct 01 '21 04:10 polarathene

Thats pretty much what I had in mind long term (migrating to pytest instead of the jury-rigged bash test suite I wrote), but yeah that will be some heavy work.

Oct 01 '21 06:10 buchdag

acme-companion acme-companion copied to clipboard

CI : default_cert test fails way too often / randomly on GitHub Actions

acme-companion
acme-companion copied to clipboard