flux-sched `t4004-match-hwloc.t` fails if flux curve keys do not exist

❯ make check  
<snip>
PASS: t4003-cancel-info.t 7 - removing resource works
ERROR: t4004-match-hwloc.t - missing test plan
ERROR: t4004-match-hwloc.t - exited with status 137 (terminated by signal 9?)
PASS: t4005-match-unsat.t 1 - loading resource module with a tiny machine config works
<snip>
❯ ./t4004-match-hwloc.t
flux-broker: zsecurity_comms_init: The directory '/home/sherbein/.flux' does not exist.  Have you run "flux keygen"?
flux-broker: overlay_bind failed: No such file or directory
flux-broker: bootstrap failed
flux-broker: zsecurity_comms_init: The directory '/home/sherbein/.flux' does not exist.  Have you run "flux keygen"?
flux-broker: overlay_bind failed: No such file or directory
flux-broker: bootstrap failed
flux-start: 0 (pid 920235) exited with rc=1
flux-start: 1 (pid 920236) exited with rc=1
flux-start: 2 (pid 920237) Killed
flux-start: 3 (pid 920238) Killed

I see three options (please suggest more if you have them):

Have the sharness script in flux-sched check for the keys and if they do not exist auto-generate them:

diff --git a/t/sharness.d/sched-sharness.sh b/t/sharness.d/sched-sharness.sh
index 29ae36f1..2608320e 100644
--- a/t/sharness.d/sched-sharness.sh
+++ b/t/sharness.d/sched-sharness.sh
@@ -21,6 +21,7 @@ fi

 ## Set up environment using flux(1) in PATH
 flux --help >/dev/null 2>&1 || error "Failed to find flux in PATH"
+[[ -f $HOME/.flux/curve/client ]] || flux keygen
 eval $(flux env)

Have the sharness script in flux-sched check for the keys and if they do not exist skip tests that require them
Leave flux-sched as it is and just update our quickstart guide on readthedocs to require running flux keygen before building flux-sched.

My preference is 1, but I'm not sure if there are any potential "gotchas" with doing that (@garlick?).

Thoughts?

May 09 '20 02:05 SteVwonder

What does flux-core do in this case? If it handles this sufficiently well, we should use the same trick. Otherwise, this should be fixed at both places..

May 15 '20 22:05 dongahn

I guess this is still an issue. Wouldn't it be better to be handled by sharness.d/flux-sharness.sh so that this can be solved for both flux-core and fluxion?

Jul 28 '20 23:07 dongahn

Looks like flux-core runs flux-keygen during make and saves some keys in the source tree for use while testing:

> make -j
<snip>
make[1]: Entering directory '/usr/src/etc'
  GEN      flux/.nodocs
  GEN      flux/curve
  GEN      flux/help.d/core.json
Saving /usr/src/etc/flux/curve/client
Saving /usr/src/etc/flux/curve/client_private
Saving /usr/src/etc/flux/curve/server
Saving /usr/src/etc/flux/curve/server_private

That was in a container, so /usr/src was the git repo/source tree.

I suspect we could do the same in flux-sched, and stick the equivalent of flux keygen --secdir=$BUILDDIR/etc/flux/curve in the etc/Makefile.am, and then set FLUX_SEC_DIRECTORY to that build directory in the sharness. (I'm not sure how flux-core gets away with not setting the environment variable).

Jul 29 '20 05:07 SteVwonder

I'm not sure how flux-core gets away with not setting the environment variable)

There are a number of compiled-in paths that are altered if flux detects that it is running inside the flux-core source tree. The key dir is one of those. So we cheat - sorry!

It doesn't help now, but there is flux-framework/flux-core#2767 which would eliminate the need for users to have keys. I think we may want to bump this up in priority for our TOSS 4 deliverable since reading keys out of NFS directories is generally frowned upon in LC.

In the mean time do those tests really need to start 4 brokers? The resource set is being provided as test input. Keys are only required if broker to broker connections need to be established. Edit: so changing test_under_flux 4 to test_under_flux 1 in the two hwloc tests would be another workaround.

Jul 29 '20 13:07 garlick

So we cheat - sorry!

😆

It doesn't help now, but there is flux-framework/flux-core#2767 which would eliminate the need for users to have keys.

If flux-framework/flux-core#2767 is going to solve this eventually anyway, I'd be happy to wait for a PR on that to land and by proxy handle this issue as well.

In the mean time do those tests really need to start 4 brokers?

Maybe there is a way around it, but I think 4 ranks are required so that we can run tests with 4 "nodes" worth of hwloc data (from 4 separate hwloc xml files, 1 per rank).

Jul 29 '20 16:07 SteVwonder

flux-sched flux-sched copied to clipboard

`t4004-match-hwloc.t` fails if flux curve keys do not exist

flux-sched
flux-sched copied to clipboard