pySDC icon indicating copy to clipboard operation
pySDC copied to clipboard

Timeout issues with mac runners

Open pancetta opened this issue 2 years ago • 3 comments

Looks like the mac runners (esp. the one with MPI) keep running into the artificial timeout imposed on the tests. Options:

  • We could remove the timeout in general, but then adding new tests can silently blow up the runtime of the test suite
  • We could remove the timeout for the particular tests, but they actually run quickly on other runners.
  • We could try to remove/increase the timeout for the particular tests on the particular platform, but this might add complexity to the CI workflow.

Thoughts?

pancetta avatar Nov 09 '23 09:11 pancetta

Since the timeout seems to be artifically exceeded only by the MPI tests, we can remove this timeout only for this particular test. I learnt that tests are usually written in the way that they have a short runtime anyway. The runtime of the tests is monitored (I assume). Thus, we can still have a look at the runtime at these runners (where the timeout is removed) "manually" and giving a hint that the test has to be written more shortly (i.e., should have a shorter runtime) if needed (if this would make sense). But this is probably only a short-term solution..

Unfortunately, I'm not familiar with this CI-workflow and how complex it would be to adapt the timeout only for these particular tests for the testing platform, but it could be probably worth it to spend the time for that. Observing the runtime every time pushing a PR feels not convenient and this is what the CI-workflow should provide to the user as well as to the developers: more convenience for this testing workflow.

lisawim avatar Nov 09 '23 12:11 lisawim

We could run on the macOS environment only tests that are not marked as slow. That would, ofc slightly reduce confidence in the code, but not unreasonably so since there are only few slow tests, imho. Or I can take a look at the test that is always failing to see if I can replace that with sth faster.

brownbaerchen avatar Nov 09 '23 12:11 brownbaerchen

Good point, @lisawim and @brownbaerchen. I introduced the timeout plugin to avoid having to monitor the timings manually. Test SHOULD be designed to run quickly, but developers have to take care of that themselves. And if they don't and nobody monitors the timings the duration of the testsuite will get out of hand (even more).

I'm pretty sure the key problem in this case is the MPI. These tests should actually run on dedicated hardware and not on some shared GH runners, but getting this to work reliably is not straightforward (but doable). This touches the question of continuous benchmarking btw..

pancetta avatar Nov 10 '23 06:11 pancetta