turbo
turbo copied to clipboard
Scripts are run unnecessarily depending on how turbo is configured
What version of Turborepo are you using?
1.1.9
What package manager are you using / does the bug impact?
Yarn v1
What operating system are you using?
Mac
Describe the Bug
When running a command like turbo run test
, I would expect it to build the graph of tasks to run based on packages that actually define a test
task. Instead, it seems like it assumes that all packages contain that task which results in needlessly running all tasks that would be dependencies.
In the case of the reproducer below, only the bar
package contains a test
script with a dependency on foo
. The build
task in baz was executed despite the output of it not being required for executing the requested tasks.
Expected Behavior
Only packages that actually contain the tasks to run should be considered when building the task graph.
To Reproduce
- Clone https://github.com/dcherman/turborepo-bad-script-reproducer
-
yarn install
-
yarn reproduce
Since only bar
has a test
script, I expected to see the build
task from foo
and bar
run followed by the test
task from bar
. The build
task of baz
runs despite it having no test
task and it not being a dependency of either of the other packages.
Here's an image of the graph that's generated from running turbo run test --graph
Note that the both the foo
and baz
tasks contain a node for the test
task despite not actually defining a test
task in package.json.
I can see the issue here, and I think I'm maybe 99% convinced the behavior should change. I am a little worried about anyone using "synthetic" tasks to trigger upstream packages to do something, but I think we have other ways of handling that (--include-dependencies
/ --filter=<pkg>...
if it is desired behavior.
I just ran into this as well and found it surprising and it is causing issues for my use case:
I'm currently migrating a very large monorepo over to use turborepo, and trying to update our scripts one package at a time, and my pipeline contains a top level script turbo:dev
- but I cannot run turbo run turbo:dev
without filters because it will attempt to run all the dependency scripts on all of my packages (even ones that don't yet have a turbo:dev
) which ultimately fails because those scripts are misconfigured.
For now I need to pass a really long filter enumerating out ever package that has been converted to avoid this issue
I think this discussion may also be related #1106.
I also recently ran into this issue. The section of the documentation that mentions that missing tasks are gracefully ignored led me to assume missing task dependencies would also be ignored. See https://turborepo.org/docs/core-concepts/pipelines#tasks-that-are-in-the-pipeline-but-not-in-some-packagejson.
Since ignoring missing task dependencies would be a breaking change it may make sense to introduce this as a new cli flag or config option rather than overwrite the default behavior.
CLI Option turbo run build --ignore-missing-task-deps
turbo.json Option ”ignoreMissingTaskDeps”: true
.
I’d be open to a more terse option name. However, --if-present
could be a little confusing since the npm implementation is more about suppressing errors for missing scripts and has no relation to dependencies. See https://docs.npmjs.com/cli/v8/commands/npm-run-script#if-present.
I’d be happy to look into contributing a PR if that helps move this issue along.
Joining the discussion here from a similar issue that was marked as duplicate.
If I understand correctly, the current behavior can be described as:
- Turbo will attempt to run tasks that do not exist, and "gracefully ignore" them.
- Turbo will attempt to run dependencies of tasks that do not exist.
and these two behaviors repeat themselves all the way down the pipeline dependency graph until no more potential dependencies are found.
This issue appears to be discussing the second bullet point, i.e. should Turbo try to run dependencies of tasks that do not exist?
Although similar, #1135 was more related to the first bullet point, i.e. what does "gracefully ignoring" a task mean with respect to the --dry-run
output? Currently the missing tasks are filtered from the overall task list if they don't exist, but they still appear as dependencies
of existing tasks.
I'm not completely convinced these are the same issue - but maybe there are some implementation details that would result in the decision here addressing both?
@finn-orsini you're right, they are technically distinct but related.
I think we're going to move forward with a PR to prune packages that don't have the requested task, which will have the result of causing the output from --dry-run
match what is actually executed.
However, given that the above is technically a breaking change (workaround: define a no-op task), it may take a little while to land. In the meantime, I'll put up a PR to include the non-existent tasks in the --dry-run
output, so that whichever way the above change goes, at least --dry-run
will be more informative.
Would it be possible to have the expected behaviour under a flag i.e:--if-present
and in the next major default to it instead?
I'd really like to see an intermediary step too. I was actually forced to rename some of my pipline tasks because of this behavior, which is far from ideal
I will look into putting this behind a flag.
I actually think this behavior makes a lot of sense. At its core, Turborepo seems to be a task orchestration tool with comprehensive support for caching to avoid unnecessary task execution. I see a pipeline as a declaration of task dependencies and caching more than a specific list of package.json
scripts to call. I think it's counterintuitive that Turborepo would decide what I meant when I said I wanted to execute a dependent task first. Besides, isn't the point of the cache that this shouldn't matter after the first execution?
I am currently working on migrating from Nx to Turborepo in a multi-language monorepo. There are cases where "build"
has no meaning to a project, but unless the dependencies are built, it will not work. Considering that these are also paired with "test"
pipelines that depend on "build"
, it seems intuitive that it would build the dependencies all the way down.
I recognize that explicit "{package}#{task}"
pipelines can resolve this but it seems unnecessary to add a bunch of copy/pasted pipelines to tell Turborepo I want it to do something I feel like I already told it to do. Having to use --filter
calls also feels like an unnecessary workaround. It's weird to have to run turbo run build --filter={package}...
on every package to get it to actually build dependencies when I already made "build"
dependent on "^build"
. While the discussion here includes a proposal for an option to exhibit the current behavior, having to pass this to every turbo
call in our repository is excessive.
While I understand that my opinion seems to be the minority, I think there's an ideological question here about what we mean when we declare a task dependency. Approaching the problem from that angle, a better solution could be found in the configuration of the pipelines themselves. My first thought was that we could add a new pipeline configuration option to declare dependency execution intent but this doesn't feel expressive enough.
My proposal would be to use a token to tell Turborepo what we mean when we say that a task depends on another. If we want to maintain backward compatibility, a token like ^=
could say that we want to stop traversing down the graph when we meet a dependency that does not have a task. If we are fine with breaking compatibility, a token like ^~
could say that we're doing a fuzzy dependency match and want to execute a task all the way down. I like this solution because you can mix and match how a specific dependent task should behave.
@ObliviousHarmony from what you describe it sounds like your test
task should explicitly "dependsOn": ["build", "^build"]
if building dependencies are required for tests to run (regardless of whether or not the package itself needs to build).
it sounds like your test task should explicitly "dependsOn": ["build", "^build"] if building dependencies are required for tests to run
I thought about this too @mantljosh, but, it falls apart with transitive dependencies. Even if my "test"
task builds the direct dependencies of my package, when one of those also has no "build"
but has dependencies that do, we reach the same conclusion. In my case, there are packages pulled into the main app which are not built themselves but have JS dependencies that are.
Discussed a bit w/ @jaredpalmer and he pointed out that it's reasonable for pure-js packages to not have a build step, but still require build steps from their dependencies. Given that's a scenario that we want to support, I don't think we can proceed with this change as a default.
Perhaps we could augment the --filter
syntax to include matching on having a particular task? So you could run turbo run test --filter=#has-task
or something like that. Or perhaps this could be configured per-task in turbo.json
?
@gsoltis Augmenting the --filter
syntax likely wouldn't be sufficient since you would want to be able to run turbo run test --filter={package}
and have it exhibit the desired behavior too (unless you're suggesting --filter={package}#has-task
?) What do you think of my token proposal above?
@ObliviousHarmony It seems like surprising behavior that Turborepo generates the task pipeline as if every project contains a build
task even if one or more project don't actually contain one.
Although this is absolutely a breaking change, you could accomplish this by adding a build
task to your project that does nothing but exit 0
. At that point, your project actually has a build
task and it's no longer surprising that it'd be included in the pipeline.
FWIW, I think the proposal to add --if-present
to the current version with a default to false
and potentially change it to default to true
in the next major version is a great idea.
It seems like surprising behavior that Turborepo generates the task pipeline as if every project contains a build task even if one or more project don't actually contain one.
I don't think either behavior is objectively right. The difference of opinion here doesn't really matter, I admitted already I am probably in the minority here 😃
FWIW, I think the proposal to add --if-present to the current version with a default to false and potentially change it to default to true in the next major version is a great idea.
Honestly, I would just like to settle on a solution that doesn't require either workflow to remember to pass an argument to do the right thing. These are radically different behaviors and seem like something better handled at the configuration level.
What did you think of my proposal?
@ObliviousHarmony The problem isn't solely that the dependencies don't have the task to run, so I'm not sure that your token proposal would actually solve the problem described in the original bug. To fix that one, we need to filter the initial set of projects down to ones that actually have the task being run. I see your point about transitive dependencies though and wonder if we'd end up needing a combination of both solutions.
FWIW, that if-present
flag could be supported in config pretty easily if that's desirable - just move the flag to an ifPresent
boolean property in the config file. This could follow the same precedence rules as other options so that environment variables, config, and flags are all supported.
so I'm not sure that your token proposal would actually solve the problem described in the original bug
Ah @dcherman, I came here from the open pull request and so I misunderstood the original bug it seems. No, my proposal wouldn't solve that.
I 100% agree that running the "build"
task on projects that don't have a "test"
task doesn't make any sense. There is still a case for running a "build"
task on a project with no script but that seems like the edge case. Either a configuration option or the filter augmentation proposed by @gsoltis would work just fine. One benefit of the filtering approach is that it creates a base for the hash syntax to add modifiers to filtering.
I'd suggest an approach of applying that filtering to the nodes selected for entry into the graph and then having transitive dependencies execute tasks with the original behavior. A token probably doesn't make much sense here, I can't imagine any cases where you would want to run a "test"
command but not build the dependencies of that task. The option/filter/whatever can cover the entry "build"
edge case just fine.
As an aside, I've added basically nothing to this issue's discussion 🗡️
One concern I have with using the configuration approach is the ability to override it. Currently, tasks in turbo.json
map 1:1 to the name of the script being run, so it is not feasible to, for instance, have two copies of "test", with and without the option to include targets that don't have a "test" script. This means that regardless of whether or not we augment the config file, we must have a flag to control behavior per-execution.
This is an annoying problem because like other's have mentioned, both approaches have benefits and downsides.
I think there may be a safe incremental step to take, where we only do the "ignoring" based off the top level of the execution graph. What I mean is that if I run turbo run test
, it should immediately filter out any packages that do not have a test
script and ignore them (and their dependencies) entirely.
This would break my use case:
I have tests that run from source, however I want the cache of those tests to miss if one of their dependencies changes regardless of whether that package itself has tests.
However I would say that ^tests
is not necessarily the right thing here either as I don't care if my topological dependencies test configuration changes.
Really what I want to express is that I want to use the input hash of build
but I don't actually want to run build
. There's no way of currently doing this I believe?
Until this is addressed, here's how I've been working around this:
This script generates the set of filter flags needed to run any given task
targetScript="$1"
mapfile -t < <(find packages -name "package.json" | grep -Ev "(node_modules|bower_components)")
jq -r --arg targetScript "$targetScript" '[inputs] | map(select(.scripts[$targetScript] != null)) | map("--filter=" + .name) | join(" ")' "${MAPFILE[@]}"
Which you can then pass to CLI using expansion:
turbo run build ${FILTER_FLAGS[*]}
It just supports a single script since that's our use case, but modifying it for multi task support should be pretty easy if needed.
Discussed a bit w/ @jaredpalmer and he pointed out that it's reasonable for pure-js packages to not have a build step, but still require build steps from their dependencies.
@gsoltis In this scenario wouldn't it be reasonable to expect that you have to explicitly define dependsOn: [^build]
?
I have a slightly more minimal reproduction: https://github.com/tom-sherman/turborepo-dependency-bug
In this repo i have a graph as follows
graph TD
foo#test --> foo#build --> __root__
bar#build --> __root__
My pipeline configuration looks like this:
{
"pipeline": {
"build": {
"dependsOn": ["^build"],
"outputs": ["dist/**", ".next/**"]
},
"test": {
"dependsOn": ["build"],
"outputs": []
}
}
}
When I turbo run test
I expect foo#build
and foo#test
to run, however bar#build
also runs unexpectedly.
bar#build
is not an ancestor of any test task so I don't expect it to run.
This would not be the case if bar
was a node_modules dependency of foo
but in this example it's not, they are two sibling packages. In the scenario where I want bar#build
to run I believe it's expected to have to make the dependency between foo#test
and it explicit somehow.
@tom-sherman turbo run test
is attempting to run foo#test
and bar#test
. Since test
depends on build
, foo#build
and bar#build
are considered necessary. In this case, it's obvious we don't actually want bar#build
because we're running tests, and bar
does not have any tests.
The story is different with the build
example. Currently, a package that has no build step of its own, but requires its dependencies to be built, looks very similar to the above case with test
. I.e. filtering out the non-existent task first, before resolving dependencies, would result in that package's dependencies not being built, even with a dependency on ^build
.
So the question is how we want to handle these two scenarios. In my opinion, as a default, doing extra work is probably preferable to not doing enough work, especially if the extra work is easily cache-able. So I think the current behavior should remain as a default. But, we can absolutely look at ways of providing configuration hooks (cli flags, env vars, turbo.json
keys, etc.) to allow users to specify when they want to opt in to the desired behavior in your test example.
As an additional note, there are two flavors of skipping undefined tasks, assuming the behavior has been opted-in to:
- always skip them. Anytime you encounter an undefined task, consider it "done" and stop the dependency graph traversal
- only filter the tasks that have been specified and are not in the entrypoint packages. In this case, If you run
turbo run test
, and a package is missing atest
script, it and its dependencies won't get run. But, if a dependency of a package with atest
script is missing abuild
script, we would not stop the traversal, and instead continue with building its dependencies.
I think the latter is more likely to work as expected. Tasks that really want to stop traversal can already opt in to that behavior on a per-package-task basis by defining a package-task with an empty dependsOn
.
@gsoltis I don't understand the build example, can you please create a minimal reproduction? Really struggling to get my head around how it's the same issue as what I've described and also matching my mental model up with what you're describing
I'll see if I can get a repo together, that's probably a good idea for this discussion.
But in general, turbo
works by taking the user input for which tasks to run (build
, test
, etc.) and which packages to run them in ("entry points"), and then filling in the graph for what needs to be done to run those tasks in the entry point packages.
The essence of this discussion is what to do when you encounter a node in that graph that doesn't define the task we're looking for. Do you stop traversing and consider it done? Or do you pretend that the script exists and continue traversing to what would be its dependencies if it did exist. So a package missing a test
script and a task missing a build
script face the same decision: do we run the dependencies, even though the script is missing. The trouble comes from the difference in expectations for the two cases. For the build
case, I think consensus is "yes, run the dependencies", and for test
, it looks like we would prefer not to run the dependencies, as there are no tests to run. But without turbo
knowing the difference between test
and build
, it has no way to distinguish the two cases.
Some other distinctions that we can make for this behavior are "is this package one of the entry point packages" and "is this task one of the requested tasks" vs packages and tasks that are transitively included as dependencies.
@tom-sherman Sorry for the delay, but I have put together a sample that I believe demonstrates the issue: https://github.com/gsoltis/sample-monorepo
There are two top-level applications, only one has test
script. There is also a library that has no build
script, but depends on another library that does have a build
script.
Any updates here? This breaks my use case too.
While I think there are some difficulties that have been raised (i.e. a plain-js package that still needs its dependencies to be built even if it has no build script), I think there is an improvement that can be safely made where if I run turbo run test
then it will:
- First filter the list of packages to only ones that have a
test
script - Then run the build in the same way that is already run today (with potentially unnecessary steps being run)
This would at least let us avoid the case where turbo run test
ends up building the world, even for things that will never be used by the test
scripts.
@arturenault can you describe what specifically is broken? Is turbo
doing extra work, or is some task being run incorrectly?
My understanding of the current state of the world is that in the worst case, turbo
does unnecessary work, and the difficulty is in distinguishing what exactly is unnecessary. As you can see in the thread, there are a variety of opinions on what the expected default behavior is, so we are currently defaulting to possibly doing more, rather than possibly doing less than expected.