dtls
dtls copied to clipboard
E2E tests deadlock and consumes tons of CI minutes
For some reason the E2E tests at times deadlock. A panic happens but the process never quits, until the CI job times out after 360 minutes (6 hours). https://github.com/pion/dtls/actions/runs/2376286601 is the latest example of this happening.
This wastes tons of the free GitHub Actions minutes we get (2000 per month, or about 33 hours), and though going over that isn't causing any problems currently, GitHub might change their mind about that eventually.
We need to fix this, but it's not immediately obvious to me what's breaking in the first place. The process panics, so realistically it should quit right then and there, not be killed by the 6hr timeout on Actions itself. We also have test helper from pion/transport, which should ensure no individual test runs for more than 5 minutes but that does not seem to be working on the E2E tests. @Sean-Der @at-wat if either of you has any ideas, that would be most helpful.
In the mean time, I would suggest we set https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes= to something much smaller for our jobs. I don't believe we have anything that can reasonable take up more than an hour, so perhaps that's a place to start?
This wastes tons of the free GitHub Actions minutes we get (2000 per month, or about 33 hours)
Free minutes are only applied to private repos and public repos don't consume it.
I think it's better to set timeout-minutes to the job or step to cap the execution time.
Didn't see this one when I opened https://github.com/pion/dtls/issues/559, but it has now been fixed so I'm going to optimistically close this one as well :)