flux-core
flux-core copied to clipboard
shell: input: stop writing stdin when reader is not ready
The stdin shell plugin reads input data from the KVS and tries to write it to the desitination task(s), even if the tasks aren't reading stdin. If the task is stopped (due to a debugging session) or otherwise isn't reading stdin, the buffers fill and the job is killed with:
2.320s: flux-shell[0]: FATAL: input: flux_subprocess_write: No space left on device
This is simple to reproduce given a large input file (here a 15MB file called 15M
):
$ flux run --input=15M sleep inf
2.482s: job.exception type=exec severity=0 flux_subprocess_write: No space left on device
flux-job: task(s) exited with exit code 1
2.320s: flux-shell[0]: FATAL: input: flux_subprocess_write: No space left on device
Related #2459 :disappointed: There's some ideas for "solutions" in that issue.
I had to remind myself how this works. Even if stdin is set to a file, the file contents are read and streamed to a guest.input
eventlog. Each task then separately watches the guest.input
eventlog and sends the contentss of each data event to the task. If the task subprocess internal buffer fills, then ENOSPC
is returned and a fatal job exception is raised.
It is going to be difficult to do flow control via an eventlog, though @chu11 presents some ideas in #2459. Maybe for a first cut, file input could read from the file per shell and write directly to each task, skipping the KVS (the rank 0 shell could put a redirect
event in the eventlog). When the buffer fills, it is much easier to stop the fd watcher than an streaming rpc (I'm not sure there a way to stop these?)
There would still be a problem with flow control when getting input from an eventlog though :thinking: so perhaps it would be better to figure out how to solve that problem anyway.
I think we can close this one after #6005 since the file input method no longer goes through the KVS. We will keep #2459 open to track the lack of flow control in the "service" or interactive input implementation.