Mark Grondona
Mark Grondona
Well a simple fix would be to just put everything under `/etc/flux/modprobe`, i.e. we'd have: ``` /etc/flux/modprobe /etc/flux/modprobe/modprobe.toml /etc/flux/modprobe/modprobe.d/ /etc/flux/modprobe/rc1.py /etc/flux/modprobe/rc1.d/ /etc/flux/modprobe/rc3.py /etc/flux/modprobe/rc3.d/ ```
Ok, this PR now places everything modprobe related under `/etc/flux/modprobe`. I've also added a rough draft `flux-modprobe(1)` manpage, which should help reviewers or people that want to review the interface...
Ok, I guess I'll set MWP here. This seems fairly safe since it doesn't do anything by default yet...
> I wonder if we could add an exec_bypass flag and let this plugin use both flags and post alloc (adding a fake or empty R), free, start, and finish...
I haven't fully read through the existing code to see where we're at, but here's some quick thoughts based on conversation above: Make sure you set the `alloc-bypass` flag _before_...
Ok, probably not a `hostlist_count` bug, the problem was I was calling ``` $ flux R parse-config /etc/flux/system/conf.d | flux R decode --nodelist | flux hostlist --count ``` Which didn't...
IIRC, LC sets `TMPDIR` to `/var/tmp/username`. Flux creates a job tmpdir in `TMPDIR`. One solution would just be to override the LC `TMPDIR` for containers. I'm not sure it is...
I think we'd need tooling. Sysadmins could either record the reason manually, or it could be done from a script, since the return of nodes to service is often automated....
Yeah, I wasn't thinking of the notification service specifically here, but its a good thought the that a notification service could be extended in this manner. Root cause determination could...
I see these fail occasionally in github CI as well. My guess is each has some kind of race condition.