fleet
fleet copied to clipboard
Bug: Script timeout is currently not respected by Orbit which in turn does not respect the modified timeout.
Fleet version:
Web browser and operating system:
💥 Actual behavior
On macOS a script timeout doesn't take into effect until the existing command in the script completes. For example, running a script including sleep 10 with a timeout of 5s will timeout after 10s, not after 5s. This may be desired behavior, as killing a command before completion could render the device in an unstable state. However this does not occur on Windows devices, as the implementation there is slightly different.
🧑💻 Steps to reproduce
on macOS
- run script containing
sleep 10with a timeout of 5 seconds - the script run will not send a response until the 10 second mark stating that it timed out after 5 seconds
🕯️ More info (optional)
Part of this may be desired behavior as killing some commands (like an OS update) may render the computer in an unknown state. If that's the case, we should only document and adjust the error message returned to reflect the actual runtime interval.
Another proposed solution is to introduce a "force timeout" option when running a script.
@sharon-fdm Reminder to use bug template and populate as much information as possible in the body section.
Hey @zayhanlon heads up that a fix for this bug is not targeted to ship in the next Fleet release (4.46)
After sprint planning today, we decided to prioritize the "Support Zero Trust workflow w/ live queries: 6 queries on 13k hosts" story (#17379) instead.
cc @sharon-fdm
Waiting for @mostlikelee to provide details
@mostlikelee this is something we identified when working with the scripts. Could you please fill in the details on this. (I can't recall which OSs have this issue and in what conditions.)
We may need some product input here, but the issue is that the script timeout doesn't take into effect until the existing command in the script completes. For example, running a script including sleep 10 with a timeout of 5s will timeout after 10s, not after 5s. This may be desired behavior, as killing a command before completion could render the device in an unstable state. However this does not occur on Windows devices, as the implementation there is slightly different. My findings were on macOS, and I suspect this also occurs on Linux.
@noahtalerman curious on your thoughts here as to when a script timeout should take effect.
@nonpunctual @spokanemac curious your thought on this
@mostlikelee, I have no idea how this actually works. How I expect this to work is to capture the PID of the process, log a timestamp, and fork a bg process that sleep for X timeout. At the end of the timeout, see if PID exists, and kill it.
@mostlikelee
As an admin, I should be able to:
- run a script locally to test it
- have Fleet run the same script with the same behavior & results on hosts as my local test
That's all. I know that's not a direct answer but I think this is what Fleet admins who upload scripts expect. Thanks.
@spokanemac @nonpunctual thanks for the feedback. I believe the script timeout config exists primarily to protect admins in case they accidentally write something in their scripts that take too long or never exits, like writing an infinite loop in their scripts or running a command that never ends. The current implementation should take care of the infinite loop, but will not protect against a command that never ends. Maybe that's a corner case we shouldn't be worrying about at the moment.
If we change the behavior to kill in progress commands, it has the possibility of being a footgun if the timeout is set to a low value and kills something important, like an OS update in progress. I'm guessing killing an OS update at the wrong time may render the device in a bad state.
After speaking with @nonpunctual we believe the current behavior is indeed a bug. The bug details have been updated.
Script timeout lapse, Like a falling leaf delayed, Now finds timely path.