yagna
yagna copied to clipboard
Add a flag to Run command to wait for all output before exiting
What: Add a flag to Run command to wait for all output so that requestors that are prepared to wait can signal it and have guarantees to receive all output produced by the task.
https://github.com/golemfactory/ya-runtime-vm/pull/150 Contains the revert of the waiting mechanism that needs to be adjusted to be conditional on the presence of the flag.
While fixing https://github.com/golemfactory/yagna/issues/2035 it was apparent that vm runtime is not waiting for all output to be sent to exe-unit before sending the process exited notification and exe-unit does not query remaining output after this notification was received. Adding this wait unconditionally had some unexpected consequences and broke some examples which made it apparent that this is not a backward compatible fix.
Could be done together with #1840
result of SDK ticket: https://github.com/golemfactory/yapapi/issues/1091
It has workaround: output can be directed to the file and downloaded
@prekucki , we would like to give this one a priority, as the "large stdout output" issue is pending resolution on our end and we need this fixed in yagna.
After conversation with @prekucki , we established that this issue will be tackled as part of the project holistically approaching VM implementation.
Note: Some teams stumbled on this issue during the hackathon. Since I was on the spot I've provided them with a workaround. @prekucki, @golmek I don't think we can rely on our users knowing these workarounds. Technology should work in the first place. "Workarounds" can work in business projects where you can train a group of employees to deal with it systemically. This attitude is not going to work with developers using Golem. I wish to remove the "deferred" label and schedule a fix for this.
Additional info from the call with Rekuc pointing one additional area where the problem can appear:
-
cat file.txt
<- There's a limited buffer size to prevent the explosion of the memory, we would have to document that in the SDKResult
contents can't be bigger than "X" at once.
@prekucki , can this be the reason for such a behavior: I'm running a command on the provider, the command completes and the stdout
is empty? I'm running the same task on multiple providers on the network and in some cases, I get a positive result of the execution but the stdout is missing.