gradle-node-plugin
gradle-node-plugin copied to clipboard
Improve inputs/outputs on NpmInstall tasks
From the npm docs: https://docs.npmjs.com/cli/v7/commands/npm-install#description
This command installs a package and any packages that it depends on. If the package has a package-lock, or an npm shrinkwrap file, or a yarn lock file, the installation of dependencies will be driven by that, respecting the following order of precedence:
npm-shrinkwrap.json
package-lock.json
yarn.lock
In practice, this means that when running npm install
and, for example, a package-lock.json is present, that package lock file will serve as an input and not as an output.
When it comes to package-lock.json
we have a tricky situation, with the current functionality the npmInstall
task can be configured to run npm ci
instead of npm install
and in that case package-lock.json
is only ever used as an input, but when it comes to npm install
it will modify the file if it installs.
Now if my memory serves we opted for having this as an output to prevent npm install
from running, changing the file, and then running again on the next build since the input has changed, though this might not be the optimal solution.
B
Thanks for the insights, @deepy .
but when it comes to npm install it will modify the file if it installs.
If you run npm install
(with no args) and have a package-lock.json present, the package-lock.json file is read but not written (i.e. it is then not an input but an output).
It's tricky given the various behaviours of the underlying npm install command.
Is that still true if package.json
has changed but not package-lock.json
?
(which isn't necessarily the use-case we want to optimise for)
Is that still true if package.json has changed but not package-lock.json?
My understanding is that whenever you modify package.json, it always implicitly updates the package-lock.json.
I'm aware of the following scenarios with npm install
:
-
npm install
is run when package.json and package-lock.json are both not present --> both files are created (i.e. they are both outputs) -
npm install
is run when only package.json is present --> package-lock.json is created (i.e. one is an input, the latter is an output) -
npm install
is run when package.json and package-lock.json are both present --> both files are updated (i.e. they are both outputs)
The three scenarios above show that with Gradle's input/output annotations you cannot properly model all scenarios. After discussion with one of our build caching experts, we believe it is best to drop the inputs and outputs annotations on the NpmInstall task alltogether and to always run the task, and let npm decide what to run and what to avoid to run.
Marking the node_modules directory as an output is also problematic. This directory can be very, very big, leading to (performance) overhead when Gradle has to calculate the hash of the node_modules folder and when copying in and out the content of the node_modules folder. At the same time, npm itself is already doing basically the same in terms of file I/O by copying over the needed modules from a local cache. Thus, there is not much to gain by using Gradle's caching functionality on top of npm's caching. The latest npm version even has a package-lock.json in the node_modules folder to speed up calculating the state of the node_modules folder. This behavior cannot be mimicked in Gradle.
When one build tool (Gradle) calls another build tool (npm), we feel it is best not to have the invoking build tool duplicate the optimizations that are present in the invoked build tool. The mimicking of the optimization logic might not be fully correct in all cases and as the underlying build tool evolves it will become increasingly hard/impossible for the invoking build tool to provide the same optimzation logic. Also, even if one could fully correctly duplicate the optimization logic in the invoking build tool there is very likely no performance gain compared to what the underlying build tool achieves given the highly dominating I/O operations.
Might be able to use new feature in Gradle 7.3 to just disable state tracking for task.
Task.doNotTrackState()
https://github.com/gradle/gradle/issues/9095
Gradle 7.3 has been released and talks about npm specifically as something that would benefit from the doNotTrackState
feature.
https://docs.gradle.org/7.3/userguide/more_about_tasks.html#sec:untracked_external_tool
Opting out of tracking requires 7.3 and that requires groovy 3 which requires spock 2.0 which breaks (see #207)
So I'm moving this to 3.3
@deepy Did you have an issue upgrading to 7.3? I don't recall having to do anything special to upgrade and have been using the doNotTrackState
feature.
@nickcaballero 7.3 upgrade was smooth but required upgrading to groovy 3 and that meant upgrading spock to 2.0 (from 1.3) which broke the RunWithMultipleVersionsExtension
I looked into it briefly but didn't have enough time to fix it
So no issues with Gradle, but our test suite broke with the spock upgrade