turbo Scripts are run unnecessarily depending on how turbo is configured

What version of Turborepo are you using?

1.1.9

What package manager are you using / does the bug impact?

Yarn v1

What operating system are you using?

Mac

Describe the Bug

When running a command like turbo run test, I would expect it to build the graph of tasks to run based on packages that actually define a test task. Instead, it seems like it assumes that all packages contain that task which results in needlessly running all tasks that would be dependencies.

In the case of the reproducer below, only the bar package contains a test script with a dependency on foo. The build task in baz was executed despite the output of it not being required for executing the requested tasks.

Expected Behavior

Only packages that actually contain the tasks to run should be considered when building the task graph.

To Reproduce

Clone https://github.com/dcherman/turborepo-bad-script-reproducer
yarn install
yarn reproduce

Since only bar has a test script, I expected to see the build task from foo and bar run followed by the test task from bar. The build task of baz runs despite it having no test task and it not being a dependency of either of the other packages.

Here's an image of the graph that's generated from running turbo run test --graph

Note that the both the foo and baz tasks contain a node for the test task despite not actually defining a test task in package.json.

Mar 24 '22 21:03 dcherman

I can see the issue here, and I think I'm maybe 99% convinced the behavior should change. I am a little worried about anyone using "synthetic" tasks to trigger upstream packages to do something, but I think we have other ways of handling that (--include-dependencies / --filter=<pkg>... if it is desired behavior.

Mar 24 '22 22:03 gsoltis

I just ran into this as well and found it surprising and it is causing issues for my use case:

I'm currently migrating a very large monorepo over to use turborepo, and trying to update our scripts one package at a time, and my pipeline contains a top level script turbo:dev - but I cannot run turbo run turbo:dev without filters because it will attempt to run all the dependency scripts on all of my packages (even ones that don't yet have a turbo:dev) which ultimately fails because those scripts are misconfigured.

For now I need to pass a really long filter enumerating out ever package that has been converted to avoid this issue

May 04 '22 14:05 mantljosh

I think this discussion may also be related #1106.

I also recently ran into this issue. The section of the documentation that mentions that missing tasks are gracefully ignored led me to assume missing task dependencies would also be ignored. See https://turborepo.org/docs/core-concepts/pipelines#tasks-that-are-in-the-pipeline-but-not-in-some-packagejson.

Since ignoring missing task dependencies would be a breaking change it may make sense to introduce this as a new cli flag or config option rather than overwrite the default behavior.

CLI Option turbo run build --ignore-missing-task-deps

turbo.json Option ”ignoreMissingTaskDeps”: true.

I’d be open to a more terse option name. However, --if-present could be a little confusing since the npm implementation is more about suppressing errors for missing scripts and has no relation to dependencies. See https://docs.npmjs.com/cli/v8/commands/npm-run-script#if-present.

I’d be happy to look into contributing a PR if that helps move this issue along.

May 10 '22 12:05 michaelkplai

Joining the discussion here from a similar issue that was marked as duplicate.

If I understand correctly, the current behavior can be described as:

Turbo will attempt to run tasks that do not exist, and "gracefully ignore" them.
Turbo will attempt to run dependencies of tasks that do not exist.

and these two behaviors repeat themselves all the way down the pipeline dependency graph until no more potential dependencies are found.

This issue appears to be discussing the second bullet point, i.e. should Turbo try to run dependencies of tasks that do not exist?

Although similar, #1135 was more related to the first bullet point, i.e. what does "gracefully ignoring" a task mean with respect to the --dry-run output? Currently the missing tasks are filtered from the overall task list if they don't exist, but they still appear as dependencies of existing tasks.

I'm not completely convinced these are the same issue - but maybe there are some implementation details that would result in the decision here addressing both?

May 11 '22 14:05 finn-orsini

@finn-orsini you're right, they are technically distinct but related.

I think we're going to move forward with a PR to prune packages that don't have the requested task, which will have the result of causing the output from --dry-run match what is actually executed.

However, given that the above is technically a breaking change (workaround: define a no-op task), it may take a little while to land. In the meantime, I'll put up a PR to include the non-existent tasks in the --dry-run output, so that whichever way the above change goes, at least --dry-run will be more informative.

May 11 '22 20:05 gsoltis

Would it be possible to have the expected behaviour under a flag i.e:--if-present and in the next major default to it instead?

May 12 '22 09:05 zanona

I'd really like to see an intermediary step too. I was actually forced to rename some of my pipline tasks because of this behavior, which is far from ideal

May 12 '22 10:05 TxHawks

I will look into putting this behind a flag.

May 12 '22 17:05 gsoltis

I actually think this behavior makes a lot of sense. At its core, Turborepo seems to be a task orchestration tool with comprehensive support for caching to avoid unnecessary task execution. I see a pipeline as a declaration of task dependencies and caching more than a specific list of package.json scripts to call. I think it's counterintuitive that Turborepo would decide what I meant when I said I wanted to execute a dependent task first. Besides, isn't the point of the cache that this shouldn't matter after the first execution?

I am currently working on migrating from Nx to Turborepo in a multi-language monorepo. There are cases where "build" has no meaning to a project, but unless the dependencies are built, it will not work. Considering that these are also paired with "test" pipelines that depend on "build", it seems intuitive that it would build the dependencies all the way down.

I recognize that explicit "{package}#{task}" pipelines can resolve this but it seems unnecessary to add a bunch of copy/pasted pipelines to tell Turborepo I want it to do something I feel like I already told it to do. Having to use --filter calls also feels like an unnecessary workaround. It's weird to have to run turbo run build --filter={package}... on every package to get it to actually build dependencies when I already made "build" dependent on "^build". While the discussion here includes a proposal for an option to exhibit the current behavior, having to pass this to every turbo call in our repository is excessive.

While I understand that my opinion seems to be the minority, I think there's an ideological question here about what we mean when we declare a task dependency. Approaching the problem from that angle, a better solution could be found in the configuration of the pipelines themselves. My first thought was that we could add a new pipeline configuration option to declare dependency execution intent but this doesn't feel expressive enough.

My proposal would be to use a token to tell Turborepo what we mean when we say that a task depends on another. If we want to maintain backward compatibility, a token like ^= could say that we want to stop traversing down the graph when we meet a dependency that does not have a task. If we are fine with breaking compatibility, a token like ^~ could say that we're doing a fuzzy dependency match and want to execute a task all the way down. I like this solution because you can mix and match how a specific dependent task should behave.

May 13 '22 15:05 ObliviousHarmony

@ObliviousHarmony from what you describe it sounds like your test task should explicitly "dependsOn": ["build", "^build"] if building dependencies are required for tests to run (regardless of whether or not the package itself needs to build).

May 17 '22 15:05 mantljosh

it sounds like your test task should explicitly "dependsOn": ["build", "^build"] if building dependencies are required for tests to run

I thought about this too @mantljosh, but, it falls apart with transitive dependencies. Even if my "test" task builds the direct dependencies of my package, when one of those also has no "build" but has dependencies that do, we reach the same conclusion. In my case, there are packages pulled into the main app which are not built themselves but have JS dependencies that are.

May 17 '22 16:05 ObliviousHarmony

Discussed a bit w/ @jaredpalmer and he pointed out that it's reasonable for pure-js packages to not have a build step, but still require build steps from their dependencies. Given that's a scenario that we want to support, I don't think we can proceed with this change as a default.

Perhaps we could augment the --filter syntax to include matching on having a particular task? So you could run turbo run test --filter=#has-task or something like that. Or perhaps this could be configured per-task in turbo.json?

May 17 '22 19:05 gsoltis

@gsoltis Augmenting the --filter syntax likely wouldn't be sufficient since you would want to be able to run turbo run test --filter={package} and have it exhibit the desired behavior too (unless you're suggesting --filter={package}#has-task?) What do you think of my token proposal above?

May 17 '22 19:05 ObliviousHarmony

@ObliviousHarmony It seems like surprising behavior that Turborepo generates the task pipeline as if every project contains a build task even if one or more project don't actually contain one.

Although this is absolutely a breaking change, you could accomplish this by adding a build task to your project that does nothing but exit 0. At that point, your project actually has a build task and it's no longer surprising that it'd be included in the pipeline.

FWIW, I think the proposal to add --if-present to the current version with a default to false and potentially change it to default to true in the next major version is a great idea.

May 17 '22 19:05 dcherman

It seems like surprising behavior that Turborepo generates the task pipeline as if every project contains a build task even if one or more project don't actually contain one.

I don't think either behavior is objectively right. The difference of opinion here doesn't really matter, I admitted already I am probably in the minority here 😃

FWIW, I think the proposal to add --if-present to the current version with a default to false and potentially change it to default to true in the next major version is a great idea.

Honestly, I would just like to settle on a solution that doesn't require either workflow to remember to pass an argument to do the right thing. These are radically different behaviors and seem like something better handled at the configuration level.

What did you think of my proposal?

May 19 '22 02:05 ObliviousHarmony

@ObliviousHarmony The problem isn't solely that the dependencies don't have the task to run, so I'm not sure that your token proposal would actually solve the problem described in the original bug. To fix that one, we need to filter the initial set of projects down to ones that actually have the task being run. I see your point about transitive dependencies though and wonder if we'd end up needing a combination of both solutions.

FWIW, that if-present flag could be supported in config pretty easily if that's desirable - just move the flag to an ifPresent boolean property in the config file. This could follow the same precedence rules as other options so that environment variables, config, and flags are all supported.

May 19 '22 03:05 dcherman

so I'm not sure that your token proposal would actually solve the problem described in the original bug

Ah @dcherman, I came here from the open pull request and so I misunderstood the original bug it seems. No, my proposal wouldn't solve that.

I 100% agree that running the "build" task on projects that don't have a "test" task doesn't make any sense. There is still a case for running a "build" task on a project with no script but that seems like the edge case. Either a configuration option or the filter augmentation proposed by @gsoltis would work just fine. One benefit of the filtering approach is that it creates a base for the hash syntax to add modifiers to filtering.

I'd suggest an approach of applying that filtering to the nodes selected for entry into the graph and then having transitive dependencies execute tasks with the original behavior. A token probably doesn't make much sense here, I can't imagine any cases where you would want to run a "test" command but not build the dependencies of that task. The option/filter/whatever can cover the entry "build" edge case just fine.

As an aside, I've added basically nothing to this issue's discussion 🗡️

May 19 '22 04:05 ObliviousHarmony

One concern I have with using the configuration approach is the ability to override it. Currently, tasks in turbo.json map 1:1 to the name of the script being run, so it is not feasible to, for instance, have two copies of "test", with and without the option to include targets that don't have a "test" script. This means that regardless of whether or not we augment the config file, we must have a flag to control behavior per-execution.

May 19 '22 16:05 gsoltis

This is an annoying problem because like other's have mentioned, both approaches have benefits and downsides.

I think there may be a safe incremental step to take, where we only do the "ignoring" based off the top level of the execution graph. What I mean is that if I run turbo run test, it should immediately filter out any packages that do not have a test script and ignore them (and their dependencies) entirely.

Jun 13 '22 15:06 mantljosh

This would break my use case:

I have tests that run from source, however I want the cache of those tests to miss if one of their dependencies changes regardless of whether that package itself has tests.

However I would say that ^tests is not necessarily the right thing here either as I don't care if my topological dependencies test configuration changes.

Really what I want to express is that I want to use the input hash of build but I don't actually want to run build. There's no way of currently doing this I believe?

Jun 14 '22 15:06 georeith

Until this is addressed, here's how I've been working around this:

This script generates the set of filter flags needed to run any given task

targetScript="$1"
mapfile -t < <(find packages -name "package.json" | grep -Ev "(node_modules|bower_components)")
jq -r --arg targetScript "$targetScript" '[inputs] | map(select(.scripts[$targetScript] != null)) | map("--filter=" + .name) | join(" ")' "${MAPFILE[@]}"

Which you can then pass to CLI using expansion:

turbo run build ${FILTER_FLAGS[*]}

It just supports a single script since that's our use case, but modifying it for multi task support should be pretty easy if needed.

Jun 16 '22 13:06 dcherman

Discussed a bit w/ @jaredpalmer and he pointed out that it's reasonable for pure-js packages to not have a build step, but still require build steps from their dependencies.

@gsoltis In this scenario wouldn't it be reasonable to expect that you have to explicitly define dependsOn: [^build]?

Jun 20 '22 14:06 tom-sherman

I have a slightly more minimal reproduction: https://github.com/tom-sherman/turborepo-dependency-bug

In this repo i have a graph as follows

graph TD
foo#test --> foo#build --> __root__
bar#build --> __root__

My pipeline configuration looks like this:

{
  "pipeline": {
    "build": {
      "dependsOn": ["^build"],
      "outputs": ["dist/**", ".next/**"]
    },
    "test": {
      "dependsOn": ["build"],
      "outputs": []
    }
  }
}

When I turbo run test I expect foo#build and foo#test to run, however bar#build also runs unexpectedly.

bar#build is not an ancestor of any test task so I don't expect it to run.

This would not be the case if bar was a node_modules dependency of foo but in this example it's not, they are two sibling packages. In the scenario where I want bar#build to run I believe it's expected to have to make the dependency between foo#test and it explicit somehow.

Jun 20 '22 14:06 tom-sherman

@tom-sherman turbo run test is attempting to run foo#test and bar#test. Since test depends on build, foo#build and bar#build are considered necessary. In this case, it's obvious we don't actually want bar#build because we're running tests, and bar does not have any tests.

The story is different with the build example. Currently, a package that has no build step of its own, but requires its dependencies to be built, looks very similar to the above case with test. I.e. filtering out the non-existent task first, before resolving dependencies, would result in that package's dependencies not being built, even with a dependency on ^build.

So the question is how we want to handle these two scenarios. In my opinion, as a default, doing extra work is probably preferable to not doing enough work, especially if the extra work is easily cache-able. So I think the current behavior should remain as a default. But, we can absolutely look at ways of providing configuration hooks (cli flags, env vars, turbo.json keys, etc.) to allow users to specify when they want to opt in to the desired behavior in your test example.

As an additional note, there are two flavors of skipping undefined tasks, assuming the behavior has been opted-in to:

always skip them. Anytime you encounter an undefined task, consider it "done" and stop the dependency graph traversal
only filter the tasks that have been specified and are not in the entrypoint packages. In this case, If you run turbo run test, and a package is missing a test script, it and its dependencies won't get run. But, if a dependency of a package with a test script is missing a build script, we would not stop the traversal, and instead continue with building its dependencies.

I think the latter is more likely to work as expected. Tasks that really want to stop traversal can already opt in to that behavior on a per-package-task basis by defining a package-task with an empty dependsOn.

Jun 20 '22 18:06 gsoltis

@gsoltis I don't understand the build example, can you please create a minimal reproduction? Really struggling to get my head around how it's the same issue as what I've described and also matching my mental model up with what you're describing

Jun 20 '22 20:06 tom-sherman

I'll see if I can get a repo together, that's probably a good idea for this discussion.

But in general, turbo works by taking the user input for which tasks to run (build, test, etc.) and which packages to run them in ("entry points"), and then filling in the graph for what needs to be done to run those tasks in the entry point packages.

The essence of this discussion is what to do when you encounter a node in that graph that doesn't define the task we're looking for. Do you stop traversing and consider it done? Or do you pretend that the script exists and continue traversing to what would be its dependencies if it did exist. So a package missing a test script and a task missing a build script face the same decision: do we run the dependencies, even though the script is missing. The trouble comes from the difference in expectations for the two cases. For the build case, I think consensus is "yes, run the dependencies", and for test, it looks like we would prefer not to run the dependencies, as there are no tests to run. But without turbo knowing the difference between test and build, it has no way to distinguish the two cases.

Some other distinctions that we can make for this behavior are "is this package one of the entry point packages" and "is this task one of the requested tasks" vs packages and tasks that are transitively included as dependencies.

Jun 21 '22 16:06 gsoltis

@tom-sherman Sorry for the delay, but I have put together a sample that I believe demonstrates the issue: https://github.com/gsoltis/sample-monorepo

There are two top-level applications, only one has test script. There is also a library that has no build script, but depends on another library that does have a build script.

Jun 30 '22 18:06 gsoltis

Any updates here? This breaks my use case too.

Aug 11 '22 18:08 arturenault

While I think there are some difficulties that have been raised (i.e. a plain-js package that still needs its dependencies to be built even if it has no build script), I think there is an improvement that can be safely made where if I run turbo run test then it will:

First filter the list of packages to only ones that have a test script
Then run the build in the same way that is already run today (with potentially unnecessary steps being run)

This would at least let us avoid the case where turbo run test ends up building the world, even for things that will never be used by the test scripts.

Aug 15 '22 19:08 mantljosh

@arturenault can you describe what specifically is broken? Is turbo doing extra work, or is some task being run incorrectly?

My understanding of the current state of the world is that in the worst case, turbo does unnecessary work, and the difficulty is in distinguishing what exactly is unnecessary. As you can see in the thread, there are a variety of opinions on what the expected default behavior is, so we are currently defaulting to possibly doing more, rather than possibly doing less than expected.

Aug 15 '22 19:08 gsoltis

turbo turbo copied to clipboard

Scripts are run unnecessarily depending on how turbo is configured

What version of Turborepo are you using?

What package manager are you using / does the bug impact?

What operating system are you using?

Describe the Bug

Expected Behavior

To Reproduce

turbo
turbo copied to clipboard