flux-core shell: input: stop writing stdin when reader is not ready

shell: input: stop writing stdin when reader is not ready

Open grondo opened this issue 1 year ago • 1 comments

The stdin shell plugin reads input data from the KVS and tries to write it to the desitination task(s), even if the tasks aren't reading stdin. If the task is stopped (due to a debugging session) or otherwise isn't reading stdin, the buffers fill and the job is killed with:

2.320s: flux-shell[0]: FATAL: input: flux_subprocess_write: No space left on device

This is simple to reproduce given a large input file (here a 15MB file called 15M):

$ flux run --input=15M sleep inf
2.482s: job.exception type=exec severity=0 flux_subprocess_write: No space left on device
flux-job: task(s) exited with exit code 1
2.320s: flux-shell[0]: FATAL: input: flux_subprocess_write: No space left on device

May 15 '24 15:05 grondo

Related #2459 :disappointed: There's some ideas for "solutions" in that issue.

I had to remind myself how this works. Even if stdin is set to a file, the file contents are read and streamed to a guest.input eventlog. Each task then separately watches the guest.input eventlog and sends the contentss of each data event to the task. If the task subprocess internal buffer fills, then ENOSPC is returned and a fatal job exception is raised.

It is going to be difficult to do flow control via an eventlog, though @chu11 presents some ideas in #2459. Maybe for a first cut, file input could read from the file per shell and write directly to each task, skipping the KVS (the rank 0 shell could put a redirect event in the eventlog). When the buffer fills, it is much easier to stop the fd watcher than an streaming rpc (I'm not sure there a way to stop these?)

There would still be a problem with flow control when getting input from an eventlog though :thinking: so perhaps it would be better to figure out how to solve that problem anyway.

May 16 '24 02:05 grondo

I think we can close this one after #6005 since the file input method no longer goes through the KVS. We will keep #2459 open to track the lack of flow control in the "service" or interactive input implementation.

May 23 '24 22:05 grondo

flux-core flux-core copied to clipboard

shell: input: stop writing stdin when reader is not ready

flux-core
flux-core copied to clipboard