gradle-node-plugin icon indicating copy to clipboard operation
gradle-node-plugin copied to clipboard

OutputFiles should only look at package.json files to match yarn

Open anuraaga opened this issue 5 years ago • 7 comments

Currently, the YarnInstallTask outputs contain the whole node_modules directory - https://github.com/node-gradle/gradle-node-plugin/blob/master/src/main/groovy/com/moowork/gradle/node/yarn/YarnInstallTask.groovy#L55

I wonder if anyone really gets benefit out of this - it takes a really long time for Gradle to scan through all the files in node_modules to do its up-to-date check (at least on Windows). From what I understand, yarn checks for up-to-date by looking at package.json files only, not every single source file. Would it make sense that by default, this plugin also only adds package.json to the output files?

anuraaga avatar Nov 25 '19 06:11 anuraaga

Out of curiosity, how many files do you have in node_modules and how long does the scanning take?

deepy avatar Nov 25 '19 07:11 deepy

Running find .//. ! -name . -print | grep -c // I get 89809 files. The package.json is here. The Gradle up-to-date check takes about a minute. yarn itself finishes an up-to-date check in a couple of seconds.

This is one of the smaller projects I work on, I would imagine many have an order of magnitude more number of files.

I can work around by excluding files like here so personally am fine with it

https://github.com/line/armeria/pull/2280/files#diff-d5a4015ce1eb6ef9e5374a6cfc4d28a7

But I'm wondering if the current behavior is a reasonable default - I'd expect large node_modules to be the norm, not exception and people should be running into this. Might even consider disabling up-to-date on YarnInstallTask completely - in general yarn will know best how to handle up-to-date for what it does, and it always runs very fast.

anuraaga avatar Nov 25 '19 07:11 anuraaga

The project I am working on contains about 70 000 files (according to the comment in your command above). The up-to-date check is really fast on my laptop (less than 1 second when nothing changed when reusing a Gradle daemon, about 30 seconds with a fresh Gradle instance).

I wonder why I don't get the same behavior as you. I use NPM and not Yarn but it should not change anything since the up-to-date detection does not depend on the task's behavior.

Is the up-to-date detection really fast with your workaround?

Even if Yarn handles the up-to-date checking faster, it still makes sense to declare the node_modules directory as an output since it is true. If we don't do it, if you manually remove the node_modules directory, Yarn will not run, whereas it should.

Note that we introduced the option you use in your workaround to workaround another issue described in #38.

bsautel avatar Dec 06 '19 15:12 bsautel

Did you try on Windows? I suspect it only affects that - Windows file systems don't have mtime so Gradle needs to hash all files.

Up-to-date detection is quite fast (~1s) when restricted to package.json files. While it could make sense to do such a restriction only on Windows, given yarn does this on all OS's, I don't see a reason not to match its behavior.

The alternative idea for not declaring node_modules is to not declare any inputs/outputs or upToDateWhen { false }, so removal of node_modules would also still allow it to run. This is because it's likely yarn would always do a better check than Gradle as it has domain-specific knowledge about its task (in this case that checking only package.json tends to be "good enough"). But I don't think this has huge advantages over tweaking the current Gradle outputs to restrict to just looking at package.json.

anuraaga avatar Dec 07 '19 07:12 anuraaga

I don't use Windows, it is probably the reason why I don't get this behavior.

As far as I understand, you suggest to disable the up-to-date detection. This will cause the task to run always, but it is not a big deal since yarn is quite fast (a few seconds according to you).

Should we do that always or only for Windows? Should that be the default behavior or configurable?

bsautel avatar Dec 07 '19 10:12 bsautel

My recommendation is to disable up-to-date detection by default on all systems, to have standard behavior on them, but make it configurable in case users do want to tweak it.

My alternative recommendation if this doesn't sound great is to updated the current OutputFiles output to filter to package.json only which is similar to what yarn does and should be reasonable too.

Hope that makes sense :)

anuraaga avatar Dec 07 '19 10:12 anuraaga

@anuraaga which version of the plugin do you use?

The version 2.2.0 introduced a big performance drop due to up-to-date checking which is due to the fact we now declare a @OutputFiles file tree instead of an @OutputDirectory file.

If you are using the version 2.2.0, could you please test the 2.1.1 to see whether it is better or not?

I don't know about the limitations regarding up-to-date detection on Windows (due to the file system). Maybe this will not change anything because Gradle has the same behavior on Windows regarding @OutputFiles and @OutputDirectory?

bsautel avatar Jan 22 '20 10:01 bsautel