flux-sched
flux-sched copied to clipboard
`t8001` fails in CI, bracket mismatch?
It looks there's a mismatch in t8001-util-ion-R.t:
not ok 6 - fluxion-R: encoding properties on heterogeneity works
#
# cat <<-EOF >expected6 &&
# /cluster0 -1 {}
# /cluster0/foo2 0 {"arm-v9@core":""}
# /cluster0/foo2/core0 0 {}
# /cluster0/foo2/core1 0 {}
# /cluster0/foo2/gpu0 0 {}
# /cluster0/foo2/gpu1 0 {}
# /cluster0/foo3 2 {"arm-v9@core":"","amd-mi60@gpu":""}
# /cluster0/foo3/core0 2 {}
# /cluster0/foo3/core1 2 {}
# /cluster0/foo3/gpu0 2 {}
# /cluster0/foo3/gpu1 2 {}
# /cluster0/foo1 3 {"arm-v9@core":"","amd-mi60@gpu":""}
# /cluster0/foo1/core0 3 {}
# /cluster0/foo1/core1 3 {}
# /cluster0/foo1/gpu0 3 {}
# /cluster0/foo1/gpu1 3 {}
# /cluster0/foo4 1 {"arm-v8@core":""}
# /cluster0/foo4/core0 1 {}
# EOF
# flux R encode -r 0 -c 0-1 -g 0-1 -p "arm-v9@core:0" -H foo2 > out6 &&
# flux R encode -r 1 -c 0 -H foo3 -p "arm-v8@core:1" >> out6 &&
# flux R encode -r 2-3 -c 0-1 -g 0-1 -p "arm-v9@core:2-3" \
# -p "amd-mi60@gpu:2-3" -H foo[1,4] >> out6 &&
# cat out6 | flux R append > combined6.json &&
# cat combined6.json | flux ion-R encode > augmented6.json &&
# jq .scheduling augmented6.json > jgf6.json &&
# print_schema2 jgf6.json paths6 &&
# test_cmp expected6 paths6
#
# failed 1 among 6 test(s)
The diff shows that the actual output isn't json, which is odd:
(s=33,d=0) fluxci@tioga10 /usr/WS1/fluxci/cibuilds/399712_tioga/flux-sched (master)$ diff trash-directory.t8001-util-ion-R/paths6 trash-directory.t8001-util-ion-R/expected6
1,18c1,18
< /cluster0 -1 []
< /cluster0/foo2 0 ["arm-v9@core"]
< /cluster0/foo2/core0 0 []
< /cluster0/foo2/core1 0 []
< /cluster0/foo2/gpu0 0 []
< /cluster0/foo2/gpu1 0 []
< /cluster0/foo3 2 ["arm-v9@core","amd-mi60@gpu"]
< /cluster0/foo3/core0 2 []
< /cluster0/foo3/core1 2 []
< /cluster0/foo3/gpu0 2 []
< /cluster0/foo3/gpu1 2 []
< /cluster0/foo1 3 ["arm-v9@core","amd-mi60@gpu"]
< /cluster0/foo1/core0 3 []
< /cluster0/foo1/core1 3 []
< /cluster0/foo1/gpu0 3 []
< /cluster0/foo1/gpu1 3 []
< /cluster0/foo4 1 ["arm-v8@core"]
< /cluster0/foo4/core0 1 []
---
> /cluster0 -1 {}
> /cluster0/foo2 0 {"arm-v9@core":""}
> /cluster0/foo2/core0 0 {}
> /cluster0/foo2/core1 0 {}
> /cluster0/foo2/gpu0 0 {}
> /cluster0/foo2/gpu1 0 {}
> /cluster0/foo3 2 {"arm-v9@core":"","amd-mi60@gpu":""}
> /cluster0/foo3/core0 2 {}
> /cluster0/foo3/core1 2 {}
> /cluster0/foo3/gpu0 2 {}
> /cluster0/foo3/gpu1 2 {}
> /cluster0/foo1 3 {"arm-v9@core":"","amd-mi60@gpu":""}
> /cluster0/foo1/core0 3 {}
> /cluster0/foo1/core1 3 {}
> /cluster0/foo1/gpu0 3 {}
> /cluster0/foo1/gpu1 3 {}
> /cluster0/foo4 1 {"arm-v8@core":""}
> /cluster0/foo4/core0 1 {}
I'm not sure what's going on here but the relevant code was changed in https://github.com/flux-framework/flux-sched/pull/1149
I wonder if somehow an older version of the Python FluxionResourceGraphV1 class is being picked up? Like maybe there's another version of its module in sys.path for some reason?
These are the others that fail FYI:
26:t1018-rv1-bootstrap2.t
61:t3027-resource-RV.t
71:t3301-system-latestart.t
89:t8001-util-ion-R.t
It's definitely that change that made it break on LC, but I wonder why. I'll check the sys.path, but could it have to do with our version of jq maybe? jq-1.6 Although I'd expect more failures if that were the case...
If you want to poke at these failing tests, you can xsu fluxci and see the logs, along with the binary they're running under:
cd /usr/WS1/fluxci/cibuilds/399712_tioga/flux-sched/
ctest -j16 --rerun-failed --output-on-failure
I wonder if somehow an older version of the Python FluxionResourceGraphV1 class is being picked up? Like maybe there's another version of its module in sys.path for some reason?
That's a good guess since the same tests do not fail in github CI. I wonder if the tests are appending instead of prepending the path to the builddir Fluxion Python modules. The CI @wihobbs is talking about here is the gitlab CI which runs on a system with flux-sched RPMs installed.
Is this still failing somewhere?
Nope.