gradle-node-plugin icon indicating copy to clipboard operation
gradle-node-plugin copied to clipboard

Improve inputs/outputs on NpmInstall tasks

Open etiennestuder opened this issue 3 years ago • 10 comments

From the npm docs: https://docs.npmjs.com/cli/v7/commands/npm-install#description

This command installs a package and any packages that it depends on. If the package has a package-lock, or an npm shrinkwrap file, or a yarn lock file, the installation of dependencies will be driven by that, respecting the following order of precedence:

npm-shrinkwrap.json
package-lock.json
yarn.lock

In practice, this means that when running npm install and, for example, a package-lock.json is present, that package lock file will serve as an input and not as an output.

etiennestuder avatar Mar 19 '21 10:03 etiennestuder

When it comes to package-lock.json we have a tricky situation, with the current functionality the npmInstall task can be configured to run npm ci instead of npm install and in that case package-lock.json is only ever used as an input, but when it comes to npm install it will modify the file if it installs.

Now if my memory serves we opted for having this as an output to prevent npm install from running, changing the file, and then running again on the next build since the input has changed, though this might not be the optimal solution. B

deepy avatar Mar 19 '21 14:03 deepy

Thanks for the insights, @deepy .

but when it comes to npm install it will modify the file if it installs.

If you run npm install (with no args) and have a package-lock.json present, the package-lock.json file is read but not written (i.e. it is then not an input but an output).

It's tricky given the various behaviours of the underlying npm install command.

etiennestuder avatar Mar 19 '21 14:03 etiennestuder

Is that still true if package.json has changed but not package-lock.json?

(which isn't necessarily the use-case we want to optimise for)

deepy avatar Mar 19 '21 14:03 deepy

Is that still true if package.json has changed but not package-lock.json?

My understanding is that whenever you modify package.json, it always implicitly updates the package-lock.json.

etiennestuder avatar Mar 19 '21 14:03 etiennestuder

I'm aware of the following scenarios with npm install:

  • npm install is run when package.json and package-lock.json are both not present --> both files are created (i.e. they are both outputs)
  • npm install is run when only package.json is present --> package-lock.json is created (i.e. one is an input, the latter is an output)
  • npm install is run when package.json and package-lock.json are both present --> both files are updated (i.e. they are both outputs)

The three scenarios above show that with Gradle's input/output annotations you cannot properly model all scenarios. After discussion with one of our build caching experts, we believe it is best to drop the inputs and outputs annotations on the NpmInstall task alltogether and to always run the task, and let npm decide what to run and what to avoid to run.

Marking the node_modules directory as an output is also problematic. This directory can be very, very big, leading to (performance) overhead when Gradle has to calculate the hash of the node_modules folder and when copying in and out the content of the node_modules folder. At the same time, npm itself is already doing basically the same in terms of file I/O by copying over the needed modules from a local cache. Thus, there is not much to gain by using Gradle's caching functionality on top of npm's caching. The latest npm version even has a package-lock.json in the node_modules folder to speed up calculating the state of the node_modules folder. This behavior cannot be mimicked in Gradle.

When one build tool (Gradle) calls another build tool (npm), we feel it is best not to have the invoking build tool duplicate the optimizations that are present in the invoked build tool. The mimicking of the optimization logic might not be fully correct in all cases and as the underlying build tool evolves it will become increasingly hard/impossible for the invoking build tool to provide the same optimzation logic. Also, even if one could fully correctly duplicate the optimization logic in the invoking build tool there is very likely no performance gain compared to what the underlying build tool achieves given the highly dominating I/O operations.

etiennestuder avatar Apr 23 '21 13:04 etiennestuder

Might be able to use new feature in Gradle 7.3 to just disable state tracking for task.

Task.doNotTrackState()

https://github.com/gradle/gradle/issues/9095

nickcaballero avatar Nov 02 '21 03:11 nickcaballero

Gradle 7.3 has been released and talks about npm specifically as something that would benefit from the doNotTrackState feature.

https://docs.gradle.org/7.3/userguide/more_about_tasks.html#sec:untracked_external_tool

nickcaballero avatar Nov 19 '21 23:11 nickcaballero

Opting out of tracking requires 7.3 and that requires groovy 3 which requires spock 2.0 which breaks (see #207)

So I'm moving this to 3.3

deepy avatar Jan 04 '22 09:01 deepy

@deepy Did you have an issue upgrading to 7.3? I don't recall having to do anything special to upgrade and have been using the doNotTrackState feature.

nickcaballero avatar Jan 06 '22 14:01 nickcaballero

@nickcaballero 7.3 upgrade was smooth but required upgrading to groovy 3 and that meant upgrading spock to 2.0 (from 1.3) which broke the RunWithMultipleVersionsExtension I looked into it briefly but didn't have enough time to fix it

So no issues with Gradle, but our test suite broke with the spock upgrade

deepy avatar Jan 06 '22 14:01 deepy