turbo icon indicating copy to clipboard operation
turbo copied to clipboard

Interruptible tasks are terminated without being restarted

Open niba opened this issue 1 year ago • 10 comments
trafficstars

Verify canary release

  • [X] I verified that the issue exists in the latest Turborepo canary release.

Link to code that reproduces this issue

do i need to?

What package manager are you using / does the bug impact?

pnpm

What operating system are you using?

Mac

Which canary version will you have in your reproduction?

Turborepo v2.2.4-canary.9

Describe the Bug

I have three persistent apps: app (nextjs), api (node), and worker (node). My node projects are marked as interruptible: true in the Turbo config.

I start my project using Turbo in watch mode. There are two different scenarios:

  1. [WORKS] When I change a package that ALL my apps depend on, this scenario restarts interruptible apps correctly.
  2. [DOESN'T WORK] When I change a package that SOME of my apps depend on. The issue affects tasks that don't depend on the modified package. If these tasks are marked as interruptible, they receive a SIGINT signal (exit code 130) and don't start again.

For example, when I change code in the Next.js app, my worker task changes the "progress icon status" to "done icon status" and the last log is: image

My api task doesn't log anything and just hangs indefinitely. (it doesn't respond to any http requests etc so probably is killed)

I've tried different versions of Turbo, but none of them work.

Expected Behavior

If I change a task that is not a dependency of another persistent task, then the persistent task (marked as interruptible) should not receive SIGINT.

If this issue is difficult to fix, then at least add a restart mechanism for all interruptible tasks.

To Reproduce

Should be easy to reproduce in tests?

Additional context

No response

niba avatar Nov 12 '24 11:11 niba

I reproduced this issue easily in this repo

All one should need to do is start with yarn dev --ui=tui, make changes to _app.tsx or similar file in next-app as simple as adding a comment, the express server will then die.

choover-circle avatar Nov 26 '24 17:11 choover-circle

I also feel its worth sharing some logs, I ran with -vvv

2024-11-26T15:39:20.837-0800 [TRACE] log: signal: Want
2024-11-26T15:39:20.837-0800 [TRACE] log: signal: Want
next-app:dev:  ✓ Ready in 1390ms
2024-11-26T15:39:43.382-0800 [TRACE] tokio_util::codec::framed_impl: attempting to decode a frame
2024-11-26T15:39:43.382-0800 [TRACE] tokio_util::codec::framed_impl: frame decoded from buffer
2024-11-26T15:39:43.382-0800 [TRACE] tokio_util::codec::framed_impl: attempting to decode a frame
2024-11-26T15:39:43.382-0800 [DEBUG] turborepo_lib::process::child: stopping child process
2024-11-26T15:39:43.382-0800 [DEBUG] turborepo_lib::process::child: starting shutdown
2024-11-26T15:39:43.382-0800 [DEBUG] turborepo_lib::process: waiting for 2 processes to exit
2024-11-26T15:39:43.382-0800 [DEBUG] turborepo_lib::process::child: sending SIGINT to child 76537
2024-11-26T15:39:43.382-0800 [TRACE] turborepo_lib::process: process exited: Ok(Some(Finished(Some(0))))
2024-11-26T15:39:43.382-0800 [DEBUG] turborepo_lib::process::child: waiting for child 76537
express-app:dev: SIGINT signal received.
2024-11-26T15:39:43.389-0800 [TRACE] turborepo_lib::process::child: sending child exit
2024-11-26T15:39:43.389-0800 [DEBUG] turborepo_lib::process::child: child process stopped
2024-11-26T15:39:43.389-0800 [TRACE] turborepo_lib::process::child: exit channel was updated
2024-11-26T15:39:43.389-0800 [TRACE] turborepo_lib::process: process exited: Ok(Some(Killed))
2024-11-26T15:39:43.389-0800 [DEBUG] turborepo_lib::process: waiting for 0 processes to exit
2024-11-26T15:39:43.390-0800 [TRACE] turborepo_lib::run::watch: handling run with changed packages: Some({Other("next-app")})

Everything after next-app:dev: ✓ Ready in 1390ms was the result of changing the code in next-app.

sending SIGINT to child 76537

This is the log that killed the yarn dev in the express-server and I believe it shouldn't have been killed

https://gist.github.com/choover-circle/49ff56968127c34e41fca5914cf4d1d4

Here are the full logs

choover-circle avatar Nov 27 '24 05:11 choover-circle

I temporarily fixed this using the new siblings functionality. The node server is still setup as persistent but no longer interruptible. I then use siblings to run my build step with watch enabled in parallel while restarting the node server with nodemon. Perhaps there's something causing the node server to be killed because it's set as interruptible?

choover-circle avatar Nov 27 '24 19:11 choover-circle

Another workaround that I found was to use the watch command from tsx to monitor and restart persistent, non-interruptible tasks. tsx watch (v4.19.2) also detects changes inside internal package dependencies of the watched package.

dmstoykov avatar Jan 09 '25 11:01 dmstoykov

Is there an update on fixing this?

choover-circle avatar Feb 11 '25 22:02 choover-circle

I believe I've identified the root cause of this issue. In watch mode, currently when a change is detected it will indiscriminately stop all tasks in the run (other than the special non-interruptible tasks), then only start/restart tasks which are impacted by the changed package.

The flow here looks like this (may have some mistakes in my analysis):

  1. A change event is received, and the event_fut triggers the changed packages flow and notifies the main run_fut https://github.com/vercel/turborepo/blob/5c2a23cbff5e441523480f00808517db24c4aa2f/crates/turborepo-lib/src/run/watch.rs#L178-L183
  2. The loop gets "notified" due to the change, and calls the stopper of the main run (which includes interruptible persistent tasks), because handle_change_event did insert a package. https://github.com/vercel/turborepo/blob/5c2a23cbff5e441523480f00808517db24c4aa2f/crates/turborepo-lib/src/run/watch.rs#L198-L206
  3. Now, the stopper stops all the non-immune tasks in progress. This is already bad enough, because these tasks should be persistent until they need to be restarted, but here they're just killed from any little change
  4. execute_run is ran with the ChangedPackages::Some(...) branch hit, and so only packages which are dependents of the changed package actually get ran again. This is problematic because when there's multiple entrypoints (multiple root tasks, like in this issue), any dependency changing does not guarantee all root tasks get restarted.

It seems like this issue mostly stems from the watch design not properly taking into account multiple root tasks. An interesting observation is that the above behavior is not actually problematic with a single root persistent task, because by definition it can only be running once all of its non-persistent dependencies have already run, and so can only be killed by a valid dependency of it changing.

Or put another way, with a single root task, any package changing is the same thing as a valid dependency invalidating its dependents. This assumption is not true for multiple root tasks however.


Unfortunately, I tried to address it through some small changes, but I think it will require some larger changes to properly address that I'm not suited to make myself.

Specifically, the change that needs to be made: When a given package changes, it should only invalidate (and kill) its own dependent tree. It should then restart precisely that tree. I'm not confident on how to implement this given that the run stopper applies to the entire tree of tasks regardless of multiple potential leaf nodes at the top. The current stopper system has no sense of granularity or package identity, it's just a pile of processes that get killed all or nothing.

This issue in general seems quite related to https://github.com/vercel/turborepo/pull/9330, for reference.

@NicholasLYang maybe you have some insight here or an easy way to make the package "cleanup" process more tailored?

johnpyp avatar Feb 25 '25 12:02 johnpyp

Bumping this. A root cause has been potentially identified and I have uploaded a reproduction repo. What would the timeline be for getting this fixed?

choover-circle avatar May 27 '25 15:05 choover-circle

I have also tripped over this and would be happy to test any prospective fixes.

crimeminister avatar May 28 '25 16:05 crimeminister

Same here.

alihamoody avatar May 30 '25 18:05 alihamoody

Facing the same problem. Editing a persistent Expo app leads to hanging of another persistent, interruptible Node.js app. They have dependencies on shared packages but don't depend on each other nor have any dependencies on themselves.

alexanderdavide avatar Jun 06 '25 09:06 alexanderdavide

Would love it if this issue could get some attention. This is a very painful thing to deal with.

choover-circle avatar Aug 15 '25 16:08 choover-circle

Same here, would love to see a fix.

gkpln3 avatar Sep 14 '25 18:09 gkpln3

Same here.

sstepanchuk avatar Sep 22 '25 15:09 sstepanchuk

yes please. @gkpln3 any closer to a fix?

manan avatar Oct 16 '25 02:10 manan

Reproduced on my end too

iodimitrov avatar Oct 19 '25 21:10 iodimitrov