yagna icon indicating copy to clipboard operation
yagna copied to clipboard

Add a flag to Run command to wait for all output before exiting

Open pwalski opened this issue 2 years ago • 7 comments

What: Add a flag to Run command to wait for all output so that requestors that are prepared to wait can signal it and have guarantees to receive all output produced by the task.

https://github.com/golemfactory/ya-runtime-vm/pull/150 Contains the revert of the waiting mechanism that needs to be adjusted to be conditional on the presence of the flag.

While fixing https://github.com/golemfactory/yagna/issues/2035 it was apparent that vm runtime is not waiting for all output to be sent to exe-unit before sending the process exited notification and exe-unit does not query remaining output after this notification was received. Adding this wait unconditionally had some unexpected consequences and broke some examples which made it apparent that this is not a backward compatible fix.

pwalski avatar Feb 28 '23 18:02 pwalski

Could be done together with #1840

pwalski avatar Mar 01 '23 08:03 pwalski

result of SDK ticket: https://github.com/golemfactory/yapapi/issues/1091

golmek avatar Apr 04 '23 15:04 golmek

It has workaround: output can be directed to the file and downloaded

golmek avatar Apr 05 '23 09:04 golmek

@prekucki , we would like to give this one a priority, as the "large stdout output" issue is pending resolution on our end and we need this fixed in yagna.

grisha87 avatar Nov 13 '23 11:11 grisha87

After conversation with @prekucki , we established that this issue will be tackled as part of the project holistically approaching VM implementation.

grisha87 avatar Feb 07 '24 13:02 grisha87

Note: Some teams stumbled on this issue during the hackathon. Since I was on the spot I've provided them with a workaround. @prekucki, @golmek I don't think we can rely on our users knowing these workarounds. Technology should work in the first place. "Workarounds" can work in business projects where you can train a group of employees to deal with it systemically. This attitude is not going to work with developers using Golem. I wish to remove the "deferred" label and schedule a fix for this.

Additional info from the call with Rekuc pointing one additional area where the problem can appear:

  • cat file.txt <- There's a limited buffer size to prevent the explosion of the memory, we would have to document that in the SDK Result contents can't be bigger than "X" at once.

grisha87 avatar Feb 22 '24 09:02 grisha87

@prekucki , can this be the reason for such a behavior: I'm running a command on the provider, the command completes and the stdout is empty? I'm running the same task on multiple providers on the network and in some cases, I get a positive result of the execution but the stdout is missing.

grisha87 avatar Feb 26 '24 14:02 grisha87