nxrocks icon indicating copy to clipboard operation
nxrocks copied to clipboard

[Bug] Maven commands fail randomly in CI/CD workflow

Open tschaffter opened this issue 1 year ago • 3 comments

Plugin Name

@nxrocks/nx-spring-boot

Plugin Version

4.1.0

Nx Version

14.4.3

Expected Behaviour

./mvnw commands should successfully completes on first execution.

Actual Behaviour

This project provides the script ./mvnw to run maven commands when executing project targets (e.g. nx build <project>). In my main CI/CD workflow (GH Action), the execution of targets that rely on ./mvnw randomly fail.

Here are the high-level commands that may fail in my CI/CD work.

      - run: yarn nx affected --target=build --parallel --max-parallel=3
      - run: yarn nx run-many --all --target=test --parallel --max-parallel=2

Here is an example of error that randomly make my CI/CD workflow fail.

Error:  Error executing Maven.
java.lang.NullPointerException
	at java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
	at java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
	at java.base/java.util.Properties.put(Properties.java:1301)
	at java.base/java.util.Properties.setProperty(Properties.java:229)
	at org.apache.maven.cli.MavenCli.populateProperties(MavenCli.java:1656)
	at org.apache.maven.cli.MavenCli.properties(MavenCli.java:612)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:282)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:196)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.apache.maven.wrapper.BootstrapMainStarter.start(BootstrapMainStarter.java:47)
	at org.apache.maven.wrapper.WrapperExecutor.execute(WrapperExecutor.java:156)
	at org.apache.maven.wrapper.MavenWrapperMain.main(MavenWrapperMain.java:72)
> nx run shared-java-challenge-util:build
Executing command: ./mvnw package 
Failed to execute command: ./mvnw package 
Error: Command failed: ./mvnw package 
    at checkExecSyncError (node:child_process:828:11)
    at execSync (node:child_process:899:15)
    at runBuilderCommand (/__w/challenge-registry/challenge-registry/node_modules/@nxrocks/common/src/lib/core/jvm/utils.js:19:38)
    at runBootPluginCommand (/__w/challenge-registry/challenge-registry/node_modules/@nxrocks/nx-spring-boot/src/utils/boot-utils.js:15:43)
    at /__w/challenge-registry/challenge-registry/node_modules/@nxrocks/nx-spring-boot/src/executors/build/executor.js:10:62
    at Generator.next (<anonymous>)
    at /__w/challenge-registry/challenge-registry/node_modules/tslib/tslib.js:117:75
    at new Promise (<anonymous>)
    at Object.__awaiter (/__w/challenge-registry/challenge-registry/node_modules/tslib/tslib.js:113:16)
    at buildExecutor (/__w/challenge-registry/challenge-registry/node_modules/@nxrocks/nx-spring-boot/src/executors/build/executor.js:8:20)
Error
    at /__w/challenge-registry/challenge-registry/node_modules/@nxrocks/nx-spring-boot/src/executors/build/executor.js:12:19
    at Generator.next (<anonymous>)
    at /__w/challenge-registry/challenge-registry/node_modules/tslib/tslib.js:117:75
    at new Promise (<anonymous>)
    at Object.__awaiter (/__w/challenge-registry/challenge-registry/node_modules/tslib/tslib.js:113:16)
    at buildExecutor (/__w/challenge-registry/challenge-registry/node_modules/@nxrocks/nx-spring-boot/src/executors/build/executor.js:8:20)
    at /__w/challenge-registry/challenge-registry/node_modules/@nrwl/tao/src/commands/run.js:147:23
    at Generator.next (<anonymous>)
    at /__w/challenge-registry/challenge-registry/node_modules/tslib/tslib.js:117:75
    at new Promise (<anonymous>)
    at __awaiter (/__w/challenge-registry/challenge-registry/node_modules/tslib/tslib.js:113:16)
    at runExecutorInternal (/__w/challenge-registry/challenge-registry/node_modules/@nrwl/tao/src/commands/run.js:127:34)
    at Object.<anonymous> (/__w/challenge-registry/challenge-registry/node_modules/@nrwl/tao/src/commands/run.js:219:54)
    at Generator.next (<anonymous>)
    at /__w/challenge-registry/challenge-registry/node_modules/tslib/tslib.js:117:75
    at new Promise (<anonymous>)

Restarting the failed job in the GH Actions may lead to a successful run as well as another failed run (seems random).

Based on the log, could it be that maven or the maven wrapper (./mvnw) executions can not be reliably run in parallel? I run 2-3 tasks in parallel in my CI/CD workflow. The next troubleshooting step would be for me to test without running targets concurrently.

@tinesoft Have you ever observed this behavior with ./mvnw?

Steps to reproduce the behaviour

  1. Fork https://github.com/Sage-Bionetworks/challenge-registry.
  2. The GH workflow .github/workflows/ci.yml may randomly fail.

tschaffter avatar Aug 01 '22 17:08 tschaffter

Hi @tschaffter

@tinesoft Have you ever observed this behavior with ./mvnw?

No I haven't. sorry

Based on the log, could it be that maven or the maven wrapper (./mvnw) executions can not be reliably run in parallel? I run 2-3 tasks in parallel in my CI/CD workflow.

I would say so too, yes. It has nothing to do with the plugin per se, but rather in the concurrent capabilities of the projects that were ran.

The next troubleshooting step would be for me to test without running targets concurrently.

Yes that would be a good test indeed. You could also try to pinpoint from the nx affected command output, which ones of your projects were actually built and try to restrict the scope to these specific projects. Then, creating a simple test that build those x projects in parallel , by calling for example:

 it('test parallel build', async () => {
		for (let i = 0; i < 20; i++) {    
			execSync(`nx run-many --projects project1,project2,projectx --parallel 3`);
		}	
 });

in a loop, could help reproduce the issue locally...

Looking at the source code of Maven itself, can also help: https://github.com/apache/maven/blob/9b656c72d54e5bacbed989b64718c159fe39b537/maven-embedder/src/main/java/org/apache/maven/cli/MavenCli.java

tinesoft avatar Aug 01 '22 19:08 tinesoft

I spent some time exploring the issue and came up with a solution. The issue stems from the fact that concurrent executions of Maven is not safe. For example, two or more concurrent executions of Maven may attempt to download the same dependency at the same time and save it to ~/.m2/repository, which I've seen resulting in errors (Issue A).

A similar error occurs when multiple project targets that rely on@nxrocks/nx-spring-boot are executed in parallel, which is the default when running an nx run-many command. By default, @nxrocks/nx-spring-boot relies on the Maven wrapper (mvnw) for the sake of portability. Any execution of ./mvnw will trigger the download of the mvn binary if it is not yet installed, which is then installed in ~/.m2/wrapper. In this context, errors may arise when parallel executions of an @nxrocks/nx-spring-boot targets attempts to download the same version of the Maven binary at the same time (Issue B). Below is the stack trace of an occurrence of this issue.

$ nx run-many --all target=build

...
Exception in thread "main" java.util.zip.ZipException: zip END header not found
	at java.base/java.util.zip.ZipFile$Source.findEND(ZipFile.java:1469)
	at java.base/java.util.zip.ZipFile$Source.initCEN(ZipFile.java:1477)
	at java.base/java.util.zip.ZipFile$Source.<init>(ZipFile.java:1315)
	at java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1277)
	at java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:709)
	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:243)
	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:172)
	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:186)
Exception in thread "main" java.nio.file.NoSuchFileException: /root/.m2/wrapper/dists/apache-maven-3.8.6-bin/67568434/apache-maven-3.8.6-bin.zip.part
	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
	at org.apache.maven.wrapper.Installer.unzip(Installer.java:207)
	at org.apache.maven.wrapper.Installer.createDist(Installer.java:110)
	at org.apache.maven.wrapper.WrapperExecutor.execute(WrapperExecutor.java:151)
	at org.apache.maven.wrapper.MavenWrapperMain.main(MavenWrapperMain.java:76)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
	at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:429)
	at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:266)
	at java.base/java.nio.file.Files.move(Files.java:1432)
	at org.apache.maven.wrapper.Installer.createDist(Installer.java:95)
	at org.apache.maven.wrapper.WrapperExecutor.execute(WrapperExecutor.java:151)
	at org.apache.maven.wrapper.MavenWrapperMain.main(MavenWrapperMain.java:76)
Exception in thread "main" java.nio.file.NoSuchFileException: /root/.m2/wrapper/dists/apache-maven-3.8.6-bin/67568434/apache-maven-3.8.6-bin.zip.part
	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
	at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:429)
	at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:266)
	at java.base/java.nio.file.Files.move(Files.java:1432)
	at org.apache.maven.wrapper.Installer.createDist(Installer.java:95)
	at org.apache.maven.wrapper.WrapperExecutor.execute(WrapperExecutor.java:151)
	at org.apache.maven.wrapper.MavenWrapperMain.main(MavenWrapperMain.java:76)

The above issues become more and more likely to happen randomly as the Nx workspace grows in terms of number of Maven-based projects. Issue B can be solved by using a globally-installed Maven binary, which can be achieved by specifying the option --ignoreWrapper to the @nxrocks/nx-spring-boot executor.

Ultimately, I decided to find a solution to Issue A and Issue B that still enables projects to use different versions of Maven for enhanced project isolation. First, I added the following target to my Java projects:

    "prepare-java": {
      "executor": "@nrwl/workspace:run-commands",
      "options": {
        "commands": [
          "./mvnw dependency:go-offline -DexcludeGroupIds=org.sagebionetworks.challenge || true"
        ],
        "cwd": "apps/challenge-api-gateway"
      }
    },

This target downloads Maven (mvw) and all the dependencies of the project. Because I have Java projects that depend on a shared local library, the command ./mvnw dependency:go-offline would fail because it would not find my local library that has not even been built at this stage. I may have been able to solve this using Nx project dependncies, but this would complicate this "preparation" stage. Instead, I'm specifying the option -DexcludeGroupIds=org.sagebionetworks.challenge to prevent Maven from throwing an error when it attempt to download my shared local libraries. Unfortunately, the command mvn dependency:go-offline has several bugs and one of them is that the option -DexcludeGroupIds is not evaluated. The workaround I found was to add || true to silence any errors that mvn dependency:go-offline may generate. I really don't like doing that but since a failure of this prepare-java target should not affect subsequent targets (lint, build, etc.), I can live with this shortcoming.

In my CI workflow, I run the following command to ultimately 1) install Maven and 2) install all the project dependencies (minus my shared local libraries) sequentially for the affected projects.

      - run: yarn nx affected --target=prepare-java --parallel=1
      - run: yarn nx affected --target=lint
      - run: yarn nx affected --target=build
      - run: yarn nx affected --target=test

This solves Issue B while maintaining Java project isolation and most importantly, Issue A too.

The release notes of the latest version of Maven mention improvements for concurrent builds. I have not evaluated the impact of these improvements on the reported issues but it sounded like a good time to update the version of Maven used by the wrapper, and update the wrapper itself at the same occasion.

For each Java project, I update the .mvn/wrapper/maven-wrapper.properties file with the reference to the latest version of Maven and Maven wrapper:

distributionUrl=https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.8.6/apache-maven-3.8.6-bin.zip
wrapperUrl=https://repo.maven.apache.org/maven2/org/apache/maven/wrapper/maven-wrapper/3.1.1/maven-wrapper-3.1.1.jar

Once again, anyone using this plugin should face Issue B at some point as the number of Java projects and targets increase in the workspace. Idem for Issue A, unless Maven finds a way to safely handle the case where concurrent executions attempt to modify the same file at the same time. Because of the pseudo random nature of the issue, it may be worth adding a note about it to the README of this plugin.

tschaffter avatar Aug 07 '22 03:08 tschaffter

Hi Thomas,

Thanks for your continuing interest in the plugin and for sharing the result of your investigation in such an exhaustive way! The issues you create on my repo are always very well detailed, for that, I officially name you, my "n°1 issue reporter" !

I will look into a way to implement (and document) your workaround, with:

  • a new go-offline executor (that will download everything locally)

My only concern so far, is the compatibility when using Gradle (instead of Maven) as build system... There is an offline mode in Gradle too, but it works differently from Maven's, as it requires the --offline parameter to be passed along with each command, whereas for Maven the offline mode can be activated once (via mvnw dependency:go-offline) and then benefit to subsequent commands.

I'm still researching the best way to achieve that, that would be transparent for final users of the plugin, wether they use Maven or Gradle.

tinesoft avatar Aug 11 '22 20:08 tinesoft