metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Prevent idle computer from sleeping while flow is running

Open fumoboy007 opened this issue 1 year ago • 7 comments

As far as I can tell, Metaflow does not prevent idle sleep, which means the computer may go to sleep while a long-running flow is still in progress. Instead, Metaflow can use one of various libraries to prevent idle sleep. For example, caffeine uses macOS’s system API to prevent idle sleep.

fumoboy007 avatar Sep 16 '24 23:09 fumoboy007

you can already do it today out of the box on mac os - just try executing -

caffeinate -i bash -c "python coreweave_flow.py run"

:)

savingoyal avatar Sep 17 '24 03:09 savingoyal

@savingoyal Yes! I’m aware. However, the responsibility should be on the program to do it so that the user does not need to think about it. For example, if a web browser is currently downloading a file, the user expects idle sleep to be prevented automatically.

fumoboy007 avatar Sep 17 '24 05:09 fumoboy007

It's a good question if metaflow should interfere with the system settings of the host computer. I am unaware of any other program of a similar nature that hijacks the sleep settings. Regardless, there is always a way out with using solutions like caffeinate.

savingoyal avatar Sep 17 '24 05:09 savingoyal

Hmm I think there is a misunderstanding here. Let me elaborate.

It's a good question if metaflow should interfere with the system settings of the host computer.

To be clear, we are not talking about changing the system settings. We are talking about using an official system API to tell the OS to temporarily prevent idle sleep during a long-running operation.

I am unaware of any other program of a similar nature that hijacks the sleep settings.

As I mentioned in https://github.com/Netflix/metaflow/issues/2032#issuecomment-2354546305, it is common practice for programs to temporarily prevent idle sleep during long-running operations. Examples:

  • A web browser prevents idle sleep when downloading a file.
  • A media player prevents idle and display sleep when playing media.
  • A presentation app prevents idle and display sleep while presenting.

Here are some example command-line programs from this GitHub search:

Regardless, there is always a way out with using solutions like caffeinate.

Yes, this is a good workaround. However, I created this issue because I think Metaflow should automatically prevent idle sleep while running a flow.

I imagine the following is a common scenario:

  1. A user starts running a flow to train a neural network.
  2. The flow takes a while. The user goes for a coffee break.
  3. The computer goes to sleep, preventing progress on the flow until the user comes back.

That’s why I think it would be a good default for Metaflow to prevent idle sleep while running a flow. Can you think of any scenarios where the user would not want idle sleep to be prevented while running a flow?

fumoboy007 avatar Sep 17 '24 07:09 fumoboy007

for long-running workflows there can be other sources of interruptions as well (the machine could crash for any arbitrary reason) - that's why we recommend deploying the flow to step-functions, airflow or argo-workflows to gain an additional factor of resiliency.

savingoyal avatar Sep 17 '24 15:09 savingoyal

Sure, the computer or Metaflow could crash at any time and the user will have to deal with it. This matches the mental model of the user operating their computer—apps crash all the time, the kernel crashes from time to time, etc.

But that same mental model also prescribes that programs with important, long-running operations automatically prevent idle sleep. See the examples I already gave.

I’m not sure I understand your concerns about my suggestion. Is there any downside to preventing idle sleep when running a flow locally?

fumoboy007 avatar Sep 17 '24 17:09 fumoboy007

@fohrloop’s wakepy package looks robust and supports macOS, GNOME, KDE Plasma, Freedesktop.org DE, and Windows.

Usage:

from wakepy import keep

with keep.running(on_fail='warn'):
    # Run the flow here.

@savingoyal If you can point me to the code where the run/resume subcommands are executed, I can send a pull request.

fumoboy007 avatar Sep 17 '24 19:09 fumoboy007

hi! sorry for the delay - by design we limit the number of external dependencies and in this case, it seems that there is a good enough solution that works without any extra maintenance overhead. Happy to discuss more - feel free to reopen the issue.

savingoyal avatar Dec 13 '24 01:12 savingoyal