caffeine
caffeine copied to clipboard
Unit tests for `prif_stop` and `prif_error_stop` make fragile non-portable assumptions
Currently the approach taken to unit testing prif_stop
and prif_error_stop
is to unconditionally invoke ./build/run-fpm.sh
in the fpm
built Caffeine unit test, and inspecting the resulting process exit code.
I consider this entire approach to be very fragile for multiple reasons:
- Assumes Caffiene test executable is run from the source/build directory
- Assumes
fpm
(and possibly the compiler) are available on the compute node - Assumes
fpm
is capable of launching parallel jobs at all - Assumes parallel jobs can be launched at all (by any command) from the compute node
- Currently appears to have EVERY image launch the subjob
- Relies on process exit code propagation, which can be unreliable in loosely coupled distributed systems
I expect one or more of the above assumptions to be violated on some systems (completely breaking the Caffeine unit test) once we incorporate distributed conduits and non-trivial job spawners.
As such that we'll eventually need a "kill switch" to disable this practice, or better yet a more robust approach to exit testing that doesn't rely on programmatically invoking fom to spawn a sub-job.