azuredatastudio icon indicating copy to clipboard operation
azuredatastudio copied to clipboard

node_modules cache in pipeline misses frequently

Open Charles-Gagnon opened this issue 3 years ago • 3 comments

Our pipelines have steps to save/restore zips of the node_modules folders to avoid needing to re-download them every time, but this doesn't seem to be working correctly as it will frequently have a cache miss for PRs that don't have any changes to package.json. We should likely only need to redownload the packages when there are changes to the packages themselves.

e.g. https://mssqltools.visualstudio.com/CrossPlatBuildScripts/_build/results?buildId=170113&view=logs&j=c7493abb-a1f4-533f-2d24-71780a69f247&t=5fc456a7-9c5f-500f-2da3-8aee44c87ba8

Investigation for this should start at looking into how the cache key is calculated and how the cache job uses that key (is it per branch?)

Charles-Gagnon avatar Sep 07 '22 20:09 Charles-Gagnon

Per https://docs.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#cache-isolation-and-security there is some scoping that is per branch, which may be the source of the issues.

But it also mentions and a job building a PR has read access to the caches for the PR's target branch (for the same pipeline) - so as long as the target branch (main) is the same then I would expect it to be able to restore from that. So that's worth investigating - maybe I'm misunderstanding what that means.

@aasimkhan30 was this something you looked into when you made the cache updates a long time back?

Charles-Gagnon avatar Sep 07 '22 21:09 Charles-Gagnon

Yeah, I had come across this information when I working with pipeline cache and was hitting a cache miss issue. My issue had nothing to do with this information. It just came to my mind when I saw this issue.

aasimkhan30 avatar Sep 07 '22 21:09 aasimkhan30

However, it feels like we are hitting some kind of scoping issue because there are cache misses for the same key across different PR runs.

Here are 2 different runs with same fingerprint having cache misses:

Fingerprint 1: nodeModules|Linux|zWU6I00RwbHdNUVQywXM0A9t7ARLDaHzFAY3LAQtUyQ=

https://mssqltools.visualstudio.com/CrossPlatBuildScripts/_build/results?buildId=169966&view=logs&j=c7493abb-a1f4-533f-2d24-71780a69f247&t=5fc456a7-9c5f-500f-2da3-8aee44c87ba8&l=18

Fingerprint2: nodeModules|Linux|zWU6I00RwbHdNUVQywXM0A9t7ARLDaHzFAY3LAQtUyQ=

https://mssqltools.visualstudio.com/CrossPlatBuildScripts/_build/results?buildId=169966&view=logs&j=c7493abb-a1f4-533f-2d24-71780a69f247&t=5fc456a7-9c5f-500f-2da3-8aee44c87ba8&l=18

aasimkhan30 avatar Sep 07 '22 21:09 aasimkhan30

As far as I'm seeing, the nodeModules part of the key is treated like a file path when I read the documentation, I wonder if this is intended or not. The cache key is seemingly generated in computeNodeModulesCacheKey, which adds the package.json and yarn.lock files to an SHA hash (via the update function).

Also, my interpretation of the document is that caches are isolated entirely from each other and that the job can only access the cache that is intended for the particular pipeline (they are not supposed to be shared even for the same pipeline). The implementation for this will be tricky to do, but that's for later.

smartguest avatar Oct 05 '22 20:10 smartguest

https://github.com/microsoft/azure-pipelines-tasks/issues/12901 This is a report of a similar, if not same issue from another user of Azure Pipelines

This comment is relevant: https://github.com/microsoft/azure-pipelines-tasks/issues/12901#issuecomment-628739689

smartguest avatar Oct 05 '22 20:10 smartguest

From my findings of the pipeline, it appears any time you update an extension package or any file listed under build/npm/dirs.js the build cache key is reset as a result, this is by design according to the code. If any changes are made that DO NOT affect the extension package or node modules, the cache is preserved and will be reused.

smartguest avatar Oct 05 '22 22:10 smartguest

Make sure you're looking at the correct scripts - we use the sql-product-compile for our pipeline

Which uses sql-computeNodeModulesCacheKey.ts

This does not use npm/dirs.js. The one VS Code uses (and we inherit but don't actually use) does though, but that's not relevant for this discussion.

Charles-Gagnon avatar Oct 05 '22 22:10 Charles-Gagnon

As the above mentions - the main problem seems to be because we have separate pipeline jobs for Product Build, ad-hoc builds and PR runs. And that's just not something that the cache task supports - you can only share cache artifacts with other runs in the same pipeline (along with the additional scoping information detailed above).

So to "fix" this so they can more consistently reuse the same cache artifacts (and still using this task) we need to consider combining these all into the same pipeline definition on ADO.

What @aasimkhan30 said above is interesting, but those builds expired (next time make sure you retain them) so we can't really investigate. If we can find another example of that happening though that might indicate an actual bug in the cache task though since multiple runs of the same pipeline should be able to re-use the same cache artifacts as long as the scoping rules are followed.

Charles-Gagnon avatar Oct 05 '22 22:10 Charles-Gagnon

Yeah, just trying to see if nothing is malfunctioning under the current design as it is, so any changes to yarn.lock files plus a couple of other files. If nothing is out of the ordinary for the current design, I'll close this issue as Karl suggested I do (well specifically, for nightly builds having a hit)

smartguest avatar Oct 05 '22 22:10 smartguest

We shouldn't close this issue - we haven't fixed anything and we can make improvements as I suggested above if we combine our pipelines all into a single ADO pipeline definition so they can actually reuse the cache artifacts.

Charles-Gagnon avatar Oct 05 '22 22:10 Charles-Gagnon

Alright then, so that'll take some time to figure out how to merge them all.

smartguest avatar Oct 05 '22 22:10 smartguest

Alternatively - a simpler approach might be to just have a scheduled task on the PR pipeline that runs daily (or more often) on the main branch. That way all the PRs against main should be able to use the cache artifacts from that build.

We could even consider merging the canary build into the PR build (and rename it to just be Azure Data Studio - Validation or something more generic) since that runs every hour.

Charles-Gagnon avatar Oct 05 '22 22:10 Charles-Gagnon

This won't change anything for the product build, but we'll have to pull down the node_modules at some point regardless every time there's a change, and from looking at the history of the product build pipeline it does seem to be re-using the cache artifacts as expected.

(lately we've had a lot of yarn.lock changes so it hasn't been able to restore for a while, but when no changes are made I see it re-using the task such as in https://mssqltools.visualstudio.com/CrossPlatBuildScripts/_build/results?buildId=172796&view=logs&j=c7493abb-a1f4-533f-2d24-71780a69f247&t=f5d49255-f229-5ab6-a621-e5b2039a4806)

Charles-Gagnon avatar Oct 05 '22 22:10 Charles-Gagnon

As a test I ran a PR build against main last night and it looks like that did work as expected

https://mssqltools.visualstudio.com/CrossPlatBuildScripts/_build/results?buildId=173685&view=logs&j=c7493abb-a1f4-533f-2d24-71780a69f247&t=5fc456a7-9c5f-500f-2da3-8aee44c87ba8

Used scope: 304;ae14e11c-7eb2-46af-b588-471e6116d635;refs/heads/main;Microsoft/azuredatastudio
Missed on the following scopes: 304;ae14e11c-7eb2-46af-b588-471e6116d635;refs/heads/lewissanchez/collapse-expand-all-props;Microsoft/azuredatastudio
Entry found at fingerprint: `nodeModules|Linux|gMeSZHlbjPkplm0G7b4j/jtJ/FJlKO4dYrz4Qj9lzag=`

Charles-Gagnon avatar Oct 06 '22 16:10 Charles-Gagnon

Updated the PR pipeline to run against main daily which should let PR runs re-use that cache (as long as they aren't changing anything in package.json's)

Further improvements/consolidations can be considered later if this continues to be an issue with failed yarn installs.

Charles-Gagnon avatar Oct 06 '22 21:10 Charles-Gagnon