caffeine Unit tests for `prif_stop` and `prif_error_stop` make fragile non-portable assumptions

Unit tests for `prif_stop` and `prif_error_stop` make fragile non-portable assumptions

Open bonachea opened this issue 5 months ago • 1 comments

Currently the approach taken to unit testing prif_stop and prif_error_stop is to unconditionally invoke ./build/run-fpm.sh in the fpm built Caffeine unit test, and inspecting the resulting process exit code.

I consider this entire approach to be very fragile for multiple reasons:

Assumes Caffiene test executable is run from the source/build directory
Assumes fpm (and possibly the compiler) are available on the compute node
Assumes fpm is capable of launching parallel jobs at all
Assumes parallel jobs can be launched at all (by any command) from the compute node
Currently appears to have EVERY image launch the subjob
Relies on process exit code propagation, which can be unreliable in loosely coupled distributed systems

I expect one or more of the above assumptions to be violated on some systems (completely breaking the Caffeine unit test) once we incorporate distributed conduits and non-trivial job spawners.

As such that we'll eventually need a "kill switch" to disable this practice, or better yet a more robust approach to exit testing that doesn't rely on programmatically invoking fom to spawn a sub-job.

Sep 12 '24 03:09 bonachea

caffeine caffeine copied to clipboard

Unit tests for `prif_stop` and `prif_error_stop` make fragile non-portable assumptions

caffeine
caffeine copied to clipboard