flux-sched
flux-sched copied to clipboard
`t4004-match-hwloc.t` fails if flux curve keys do not exist
❯ make check
<snip>
PASS: t4003-cancel-info.t 7 - removing resource works
ERROR: t4004-match-hwloc.t - missing test plan
ERROR: t4004-match-hwloc.t - exited with status 137 (terminated by signal 9?)
PASS: t4005-match-unsat.t 1 - loading resource module with a tiny machine config works
<snip>
❯ ./t4004-match-hwloc.t
flux-broker: zsecurity_comms_init: The directory '/home/sherbein/.flux' does not exist. Have you run "flux keygen"?
flux-broker: overlay_bind failed: No such file or directory
flux-broker: bootstrap failed
flux-broker: zsecurity_comms_init: The directory '/home/sherbein/.flux' does not exist. Have you run "flux keygen"?
flux-broker: overlay_bind failed: No such file or directory
flux-broker: bootstrap failed
flux-start: 0 (pid 920235) exited with rc=1
flux-start: 1 (pid 920236) exited with rc=1
flux-start: 2 (pid 920237) Killed
flux-start: 3 (pid 920238) Killed
I see three options (please suggest more if you have them):
- Have the sharness script in flux-sched check for the keys and if they do not exist auto-generate them:
diff --git a/t/sharness.d/sched-sharness.sh b/t/sharness.d/sched-sharness.sh
index 29ae36f1..2608320e 100644
--- a/t/sharness.d/sched-sharness.sh
+++ b/t/sharness.d/sched-sharness.sh
@@ -21,6 +21,7 @@ fi
## Set up environment using flux(1) in PATH
flux --help >/dev/null 2>&1 || error "Failed to find flux in PATH"
+[[ -f $HOME/.flux/curve/client ]] || flux keygen
eval $(flux env)
- Have the sharness script in flux-sched check for the keys and if they do not exist skip tests that require them
- Leave flux-sched as it is and just update our quickstart guide on readthedocs to require running
flux keygenbefore building flux-sched.
My preference is 1, but I'm not sure if there are any potential "gotchas" with doing that (@garlick?).
Thoughts?
What does flux-core do in this case? If it handles this sufficiently well, we should use the same trick. Otherwise, this should be fixed at both places..
I guess this is still an issue. Wouldn't it be better to be handled by sharness.d/flux-sharness.sh so that this can be solved for both flux-core and fluxion?
Looks like flux-core runs flux-keygen during make and saves some keys in the source tree for use while testing:
> make -j
<snip>
make[1]: Entering directory '/usr/src/etc'
GEN flux/.nodocs
GEN flux/curve
GEN flux/help.d/core.json
Saving /usr/src/etc/flux/curve/client
Saving /usr/src/etc/flux/curve/client_private
Saving /usr/src/etc/flux/curve/server
Saving /usr/src/etc/flux/curve/server_private
That was in a container, so /usr/src was the git repo/source tree.
I suspect we could do the same in flux-sched, and stick the equivalent of flux keygen --secdir=$BUILDDIR/etc/flux/curve in the etc/Makefile.am, and then set FLUX_SEC_DIRECTORY to that build directory in the sharness. (I'm not sure how flux-core gets away with not setting the environment variable).
I'm not sure how flux-core gets away with not setting the environment variable)
There are a number of compiled-in paths that are altered if flux detects that it is running inside the flux-core source tree. The key dir is one of those. So we cheat - sorry!
It doesn't help now, but there is flux-framework/flux-core#2767 which would eliminate the need for users to have keys. I think we may want to bump this up in priority for our TOSS 4 deliverable since reading keys out of NFS directories is generally frowned upon in LC.
In the mean time do those tests really need to start 4 brokers? The resource set is being provided as test input. Keys are only required if broker to broker connections need to be established.
Edit: so changing test_under_flux 4 to test_under_flux 1 in the two hwloc tests would be another workaround.
So we cheat - sorry!
😆
It doesn't help now, but there is flux-framework/flux-core#2767 which would eliminate the need for users to have keys.
If flux-framework/flux-core#2767 is going to solve this eventually anyway, I'd be happy to wait for a PR on that to land and by proxy handle this issue as well.
In the mean time do those tests really need to start 4 brokers?
Maybe there is a way around it, but I think 4 ranks are required so that we can run tests with 4 "nodes" worth of hwloc data (from 4 separate hwloc xml files, 1 per rank).