turbo icon indicating copy to clipboard operation
turbo copied to clipboard

Turbo skipping build tasks without explanation

Open dobesv opened this issue 1 year ago • 28 comments

What version of Turborepo are you using?

1.4.3

What package manager are you using / does the bug impact?

Yarn v2/v3

What operating system are you using?

Linux

Describe the Bug

Recently after making a bunch of changes to my turbo build it seems like turbo is just skipping building some steps. It isn't pulling from the cache - it just isn't running them.

When I do a dry-run, it does show those tasks in the list, but when I do a regular build, the tasks are simply not run.

It says at the end that it ran 20 tasks, but in the output only 10 or 11 tasks are listed as being run (whether cached or not)

See this gist for the logs:

https://gist.github.com/dobesv/d167781db339757c53cecf8a16f3c880

Expected Behavior

Turbo should run all the tasks, in the right order.

To Reproduce

I haven't put together a reproduction yet, still kind of investigating. But hopefully someone has some tips on where to look to figure out the cause of this.

dobesv avatar Aug 10 '22 03:08 dobesv

I tried running without remote caching but it didn't fix it.

dobesv avatar Aug 10 '22 03:08 dobesv

However, if I pass --force it does run all 20 tasks. Very strange!

For obvious reasons I do not want to run with --force turned on all the time ... but it is interesting that this somehow causes it to run all the tasks. I wonder if there's something that would cause a task to not be run that --force prevents.

dobesv avatar Aug 10 '22 03:08 dobesv

Running turbo with -vv I see some more output.

For example here's log output for a task that was NOT run but should have been:

2022-08-10T18:20:54.692-0700 [DEBUG] run.@formative/item-bank-graphql-schema:build:graphql2js: start
2022-08-10T18:20:54.692-0700 [DEBUG] run: task hash: value=a51cecb3365c2ee0
2022-08-10T18:20:54.694-0700 [DEBUG] run.@formative/item-bank-graphql-schema:build:graphql2js: log file: path=/home/ubuntu/app2/packages/item-bank/item-bank-graphql-schema/.turbo/turbo-build:graphql2js.log

Here's log output for a task that fetched from cache OK:

2022-08-10T18:20:54.698-0700 [DEBUG] run.@formative/submissions-graphql-schema:build:graphql2js: start
2022-08-10T18:20:54.698-0700 [DEBUG] run: task hash: value=a51cecb3365c2ee0
2022-08-10T18:20:54.701-0700 [DEBUG] run.@formative/submissions-graphql-schema:build:graphql2js: log file: path=/home/ubuntu/app2/packages/formatives/submissions-graphql-schema/.turbo/turbo-build:graphql2js.log
2022-08-10T18:20:54.701-0700 [DEBUG] run.@formative/submissions-graphql-schema:build:graphql2js: start replaying logs
@formative/submissions-graphql-schema:build:graphql2js: cache hit, replaying output a51cecb3365c2ee0
@formative/submissions-graphql-schema:build:graphql2js: { changedCount: 0, fileCount: 5 } graphql2js finished.
2022-08-10T18:20:54.701-0700 [DEBUG] run.@formative/submissions-graphql-schema:build:graphql2js: finish replaying logs

For the task it doesn't run, it doesn't print cache miss or cache hit, it doesn't run the command. But it does print start and the name of the log file. So it must be kind of "crashing" after that without printing any errors. Maybe some missing error handling somewhere?

It's also very weird that although most of the packages are quite similar in their structure and dependencies, it just skips some but not all. If it completely skipped all of these I'd be more inclined to believe this was my own fault rather than a bug in turborepo.

dobesv avatar Aug 11 '22 01:08 dobesv

Did this previously work for you in v1.3? Wondering if changes from automatic env variable inclusion could have caused this (although that change should only cause the opposite: cache to be invalidated more often)

mehulkar avatar Aug 11 '22 01:08 mehulkar

I tried it again in v1.3.4 and the issue remains.

dobesv avatar Aug 11 '22 20:08 dobesv

I also tried in v1.3.0.

Interestingly, it seems to run a different subset of tasks when I change versions. So I think the subset of tasks it runs depends to some degree on hashes.

dobesv avatar Aug 11 '22 20:08 dobesv

I am starting to suspect this may be caused by some kind of caching conflict.

So the way it fetches from the cache, it requests the cache data based on a task hash, and then filters out which files it saves according to its expected outputs.

My suspicion is that if some tasks have the same hash, they could be reading/writing the same cache key, so no files are retrieved from the cache because the files saved with that key are in the wrong folders.

When I look at the logs I do see that many the tasks that are not being run have the same hash.

I'll see if I can make a custom build of turborepo somehow and confirm this is the issue. I think maybe some changes to the caching logic might be needed.

  1. Include the workspace and task in the hashing input so that the cache key differs between workspaces
  2. Make sure the cache data know what workspace and task it is for, and ignore cache data that happens to collide with data from a different workspace and task

dobesv avatar Aug 12 '22 00:08 dobesv

Thinking on it more, it makes no sense that the hashes would be the same for those tasks, they have different inputs. It makes me wonder if there's another issue here where the inputs are not being hashed as expected. The inputs are set to:

      "inputs": ["src/**/*.graphql"],

And as far as I know that does match different files in the different workspaces, yet it is assigning the same hash to each of the workspaces.

dobesv avatar Aug 12 '22 19:08 dobesv

Maybe ** doesn't work the way I think it does. If I change the config to look like this then the problem goes away because the hashes are not the same any more:

      "inputs": ["src/*.graphql", "src/**/*.graphql"],

dobesv avatar Aug 12 '22 21:08 dobesv

Hmm this is very very strange, when I use the pattern src/**/*.graphql it gives the same task hash for a number of workspaces with this task, but NOT all of them. And they have similar folder structures, I'm only using ** to be on the safe side. Using an alternative like src/*.graphql or **/*.graphql works and gives unique hashes to each workspace.

dobesv avatar Aug 12 '22 21:08 dobesv

So I guess I'm hitting some weird conjunction of two very weird issues:

  1. Something weird going on with globbing where task hashes are equal but shouldn't be
  2. Some underlying assumption that task hashes will never be equal, but they are in this case

I guess I do have a workaround to adjust the globs but I wonder if this could be causing any other subtle build issues.

dobesv avatar Aug 12 '22 21:08 dobesv

Aaaahhhhhh..... found the globbing issue!

https://github.com/vercel/turborepo/issues/1241

dobesv avatar Aug 12 '22 21:08 dobesv

The issue with hash conflicts and the cache remains, but shouldn't affect me as long as I always have unique input files for the task. Man, that globbing thing probably caused me 8+ hours of debugging here, I thought I was going crazy!

dobesv avatar Aug 12 '22 22:08 dobesv

@dobesv can you try the latest canary? I've updated the inputs globbing to be in line with the rest of turbo.

Failing that, can you put together a small repo that demonstrates the problem? I.e. maybe two packages that shouldn't get the same hash but do?

gsoltis avatar Aug 25 '22 20:08 gsoltis

Hmm I'll see. I have a lot on my plate now with the start of the school year. I think you can see just by inspection that the hashing logic doesn't include the workspace in it, so two workspaces with the same input hashes will have the same cache key even though the paths are different.

dobesv avatar Aug 25 '22 23:08 dobesv

Are the input files actually the same? If so, I think I maybe see the problem: if you are doing different things (running different scripts) with the same input files, even in different locations, you need to include the definition of what it is that you're doing in your inputs. In this case, that probably means package.json.

Otherwise, if you are doing the same thing with the same files, but they are located in different packages, that should work. I wouldn't be surprised if there were some non-determinism in that case though. If this is the situation you have, can you confirm that the outputs end up as expected?

gsoltis avatar Aug 26 '22 17:08 gsoltis

Are the input files actually the same?

Because of the globbing confusion, the inputs were the same - no files at all! (because I used a globbing rule with /**/ in it which isn't supported by inputs).

The hash collision issue is definitely an odd case, so I'm not sure how many people would actually run into it. It does seem odd to have two packages with the same task and the same inputs.

I could imagine it could happen from time to time. Maybe you're just starting a new package and you copied files from the other one to get started, and you run a build, and weirdly it just says cache hit but doesn't output a file.

It seems simple enough to fix, just add the workspace name into the list of things to hash, or make it a prefix of the key, something like that.

dobesv avatar Aug 26 '22 19:08 dobesv

To reproduce this hash issue I believe you just need two workspaces that are identical in all but name - the task hashes will be the same and if it restores the results of the build from the other project, nothing is output.

dobesv avatar Aug 26 '22 19:08 dobesv

I'll assume for now nobody really cares to fix this weird one ...

dobesv avatar Nov 03 '22 23:11 dobesv

@dobesv Apologies for the delay; definitely do care about getting to the bottom of this. The log output you gave is pretty large and difficult to sit down with, our first goal is to improve the information we give you about cache hits/misses and how the hashes are calculated. We will get to it eventually if you want to keep the issue open!

Alternatively, you can also build and debug locally if you like! I'm happy to help you get things set up if you want to get a head start

mehulkar avatar Nov 04 '22 00:11 mehulkar

It's not too important as it only arises if you have two tasks with identical inputs, and thus the same hash. Ab easy fix is to always put the workspace path into the hash. Or perhaps it actually only happens with zero inputs, as was the case here because of wildcard confusion.

dobesv avatar Nov 04 '22 02:11 dobesv

This may have actually been fixed as a side effect of a cache behavior update that appears in 1.6.

I haven't tested that hypothesis, but I suspect we now do the right thing if we have two identical workspaces. Will confirm or deny tomorrow.

nathanhammond avatar Nov 09 '22 16:11 nathanhammond

So, a couple of things:

  1. We have made some modifications to our default hashing pattern to include the package.json of each package into the task hash in a way that it is much harder to find yourself in this situation. Like cp -r apps/docs apps/duplicate level identical.
  2. But, in testing exactly that, I discovered a new edge case:
npx create-turbo@latest
cd ./my-turborepo
cp -r app/docs app/dolly
pnpm i
pnpm run build

Three apps, but only two task runs:

Tasks:    2 successful, 2 total

I have to sort out that edge case before I'm even able to further investigate your original report because right now it's actually impossible to recreate your original situation. 🤦‍♂️

Also, yes, we should be including the package's path into the task hash.

nathanhammond avatar Nov 10 '22 06:11 nathanhammond

Since you seem to be working on this I reopened it for now.

dobesv avatar Nov 10 '22 06:11 dobesv

That said, I think it's reasonable to suspect that the issue has been fixed, it seems like the sort of thing that someone might fix in passing.

dobesv avatar Nov 10 '22 06:11 dobesv

Just for reference: we don't close out old issues until they're actually done or we know we don't intend to do anything with them. For things like this what has most-likely happened is it fell below the fold. You probably noticed that by bumping the thread it ended up back on our radar and we did something about it. 😅

nathanhammond avatar Nov 10 '22 07:11 nathanhammond

(@dobesv we really appreciate the time that people spend reporting issues and want to make sure that we respect that investment of time and energy and take that feedback and turn Turborepo into a better product. 🖤)

nathanhammond avatar Nov 10 '22 07:11 nathanhammond

See also #2659

nathanhammond avatar Nov 10 '22 07:11 nathanhammond