maven-mvnd icon indicating copy to clipboard operation
maven-mvnd copied to clipboard

`Could not acquire write lock for…` errors

Open jglick opened this issue 2 years ago • 34 comments

I just upgraded mvnd to 1.0-m6-m39 via Snap from 0.9.0 (using Java 11) and ran a build of my reactor with 27 modules as usual (max parallelism 11). It spat out some errors such as

Could not acquire write lock for '~/.m2/repository/.locks/com.github.spotbugs~spotbugs-annotations~4.7.3.lock'
Could not acquire write lock for '~/.m2/repository/.locks/commons-collections~commons-collections~3.2.2.lock'

and the build failed partway through

Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0:enforce (enforce-bytecode-version) on project …: Execution enforce-bytecode-version of goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0:enforce failed: Could not acquire write lock for '~/.m2/repository/.locks/groupId~artifactId~1.234.lock' -> [Help 1]
Failed to execute goal some:plugin:1.2:some-mojo (default-some-mojo) on project …: Execution default-some-mojo of goal some:plugin:1.2:some-mojo failed: Could not acquire write lock for '~/.m2/repository/.locks/groupId2~artifactId2~5.678.lock' -> [Help 1]
Could not acquire write lock for '~/.m2/repository/.locks/groupId3~artifactId3~9.012.lock'

When I ran the build again, it passes, so I presume this was some sort of race condition, possibly involving artifact downloads (I had pulled in various POM updates since the last local build of the project).

No further details were provided, and I was not using -e so there was no stack trace giving context. I checked ~/.m2/mvnd/registry/1.0-m6/daemon-*.log which did not really add any more information:

Dispatch message: ExecutionFailure{projectId='…', halted=true, exception='java.lang.IllegalStateException: Could not acquire write lock for '~/.m2/repository/.locks/com.github.spotbugs~spotbugs-annotations~4.7.3.lock''}

If nothing else, the Throwable.toString of the cause ought to be included in the top-level error message I think.

jglick avatar Apr 07 '23 17:04 jglick

I faced the same issue on 1.0-m7 windows version.

Apache Maven Daemon (mvnd) 1.0-m7 windows-amd64 native client (b2ef5d81997adbcdb72dc8c5603722538fa641fe)
Terminal: org.jline.terminal.impl.jansi.win.JansiWinSysTerminal
Apache Maven 4.0.0-alpha-7 (bf699a388cc04b8e4088226ba09a403b68de6b7b)
Maven home: C:\work\tools\mvnd-0.7.1-windows-amd64\mvn
Java version: 11.0.15, vendor: Eclipse Adoptium, runtime: C:\work\tools\java\jdk-11.0.15.10-hotspot
Default locale: en_US, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"

I was able to get a full stack trace:

Suppressed: java.lang.IllegalStateException: Attempt 1: Could not acquire write lock for 'C:\work\repository\.locks\artifact~antlr~antlr~2.7.6.lock' in 30 SECONDS
        at org.eclipse.aether.internal.impl.synccontext.named.NamedLockFactoryAdapter$AdaptedLockSyncContext.acquire (NamedLockFactoryAdapter.java:202)
        at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve (DefaultArtifactResolver.java:271)
        at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts (DefaultArtifactResolver.java:259)
        at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies (DefaultRepositorySystem.java:352)
        at org.apache.maven.project.DefaultProjectDependenciesResolver.resolve (DefaultProjectDependenciesResolver.java:187)
        at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies (LifecycleDependencyResolver.java:242)
        at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.resolveProjectArtifacts (LifecycleDependencyResolver.java:193)
        at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.resolveProjectDependencies (LifecycleDependencyResolver.java:131)
        at org.apache.maven.lifecycle.internal.MojoExecutor.ensureDependenciesAreResolved (MojoExecutor.java:361)
        at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:318)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:217)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:178)
        at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:77)
        at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:166)
        at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:163)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:114)
        at io.takari.maven.builder.smart.SmartBuilderImpl.buildProject (SmartBuilderImpl.java:209)
        at io.takari.maven.builder.smart.SmartBuilderImpl$ProjectBuildTask.run (SmartBuilderImpl.java:81)
        at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:515)
        at java.util.concurrent.FutureTask.run (FutureTask.java:264)
        at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
        at java.lang.Thread.run (Thread.java:829)

sinyagin avatar Aug 02 '23 00:08 sinyagin

@sinyagin are you running on WSL ? that could be related to https://github.com/apache/maven-mvnd/issues/755

gnodet avatar Aug 02 '23 07:08 gnodet

I am getting the same using Named Locks on Hazelcast, on a Docker container though.

-Daether.syncContext.named.factory=semaphore-hazelcast -Daether.syncContext.named.nameMapper=gav

I have all the required JARs in "lib/ext" and the Hazelcast configuration in ${maven.conf}/hazelcast.xml

eliasbalasis avatar Aug 10 '23 16:08 eliasbalasis

FYI.

I have discovered that there is probably some problem with the Maven resolver implementation.

I initially used apache-maven-3.8.8-resolver-1.9.7 but neither Redisson nor Hazelcast worked. Now I am using apache-maven-3.8.8-resolver-1.8.2 and Redisson semaphore works but Hazelcast still doesn't work while Redisson rwlock doesn't work either.

I am guessing this will be all eventually fixed in Maven 3.9 as soon as it matures first.

The alternative is "-Daether.connector.basic.threads=1 -Daether.connector.resumeDownloads=false"

eliasbalasis avatar Aug 10 '23 20:08 eliasbalasis

$ mvnd -v
Apache Maven Daemon (mvnd) 1.0-m7 windows-amd64 native client (b2ef5d81997adbcdb72dc8c5603722538fa641fe)
Terminal: org.jline.terminal.impl.jansi.win.JansiWinSysTerminal
Apache Maven 3.9.3 (21122926829f1ead511c958d89bd2f672198ae9f)
Maven home: D:\projects\salog\software\mvnd\mvn
Java version: 17.0.8, vendor: Eclipse Adoptium, runtime: D:\projects\salog\software\java
Default locale: en_US, platform encoding: UTF-8
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"

But I also get this:

Suppressed: java.lang.IllegalStateException: Attempt 1: Could not acquire write lock for 'C:\Users\hohwille\.m2\repository\.locks\artifact~com.caucho~com.springsource.com.caucho~3.2.1.lock' in 30 SECONDS
        at org.eclipse.aether.internal.impl.synccontext.named.NamedLockFactoryAdapter$AdaptedLockSyncContext.acquire (NamedLockFactoryAdapter.java:202)
        at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve (DefaultArtifactResolver.java:271)
        at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts (DefaultArtifactResolver.java:259)
        at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies (DefaultRepositorySystem.java:352)

Windows Filesystem and file-locking is a disaster. However, with regular mvn the error does not occur even not when I run parallel builds (e.g. -T 2C). Please also note that also with mvnd the error goes away if I do not specify concurrent build (no -T option).

hohwille avatar Sep 01 '23 08:09 hohwille

Windows Filesystem and file-locking is a disaster

I couldn't agree more.

the error does not occur even not when I run parallel builds

Trust me, it is purely non-deterministic, I thought the same at first. Even Named Locks don't seem to be working perfectly yet. "-Daether.connector.basic.threads=1 -Daether.connector.resumeDownloads=false" hasn't worked perfectly either. From where I stand, this is a dead end at the moment. My hopes lie with Maven 3.9

the error goes away if I do not specify concurrent build (no -T option)

Indeed, the concurrent Maven local repository locking is only effective during parallel builds.

eliasbalasis avatar Sep 01 '23 09:09 eliasbalasis

I was now able to reproduce the lock errors also with pure maven and without mvnd. Therefore I created MNG-7868. IMHO this issue here is kind of invalid as it is a bug in maven itself and not in mvnd.

hohwille avatar Sep 05 '23 12:09 hohwille

Locking in Mavem aims to solve two things:

  • parallel build coordination (within one Maven parallel building process)
  • shared local repository across several Maven (ST or MT, parallel) processes

For brevity we call them

  • ST, single threaded, the "usual" way of Maven working,
  • MT, multi threaded, or "parallel", either using -T with or without "smart builder"/mvnd and
  • MP, multi process case, when local repository is shared across multiple Maven ST/MT processes (on single or multiple hosts).

HOW locking is set up (especially for which scenario, MT/MP) is left to user, some examples:

  • the default locking (since 3.9.0) the rwlock-local as name says is (JVM) local, so handles only MT builds
  • the file-lock can handle MP cases (on single host, or on properly set up NFS volume)
  • for larger scale, usually CI use case with shared local repositories, we recommend the Hazelcast/Redisson solutions

By the way, for non-mvnd users I'd personally recommend to NOT use the "vanilla" -T builder (that comes with Maven), but rather use the superior Takari Smart Builder (see README). This will make Maven parallel builds behave similarly as mvnd (also uses Smart builder) sans the persistent daemon and logging feature.

I would really like to see a reproducer project that produces this kind of error, but let me explain what happens here...

By default, the parameter aether.syncContext.named.time (30 seconds) is the amount of time a synchronization context shall wait to obtain a needed lock. In latest Maven/mvnd releases the message is actually improved to clearly show that (as can be seen above in comments, but not in original issue description: "Could not acquire write lock for $lockName in 30 SECONDS"). So, there is no cause per se, the cause IS the timeout.

Sadly, way before NamedLocks (that are actually used as "low level fine grained locking implementation") the SyncContext was defined way earlier and I "inherited" it. Major problem with this API is "coarseness" of it. As can be seen, it "grabs" all artifacts, so even if one overlapping artifact is asked by other context, mutual exclusion (in case of exclusive lock) is imminent.

My current assumption/focus is "hot artifacts": artifacts that are highly referenced (for example like slf4j-api might be in a project, or some "api" module that has zillion downstream plugin modules building against given api module, etc -- usually "star shaped" projects at least when dependencies observed) due SyncContext above could cause that parallel resolved module dependencies became serially resolved, and -- especially on larger projects -- the "looser" threads end up congested, waiting more then 30 seconds, that will cause the timeout.

Current (bad) workaround for this is raising the time limit using -Daether.syncContext.named.time=40 (default time unit is seconds) and "experiment", as this value theoretically depends on project layout and size. But is bad, as you want fast builds, but this actually slows it down even more.


An example of perf test against Apache Camel is here: https://cstamas.github.io/camel-perftest/ Sadly, I could not make it perform these errors.


So any (hopefully OSS) "reproducer project" is welcome, to sort out last bits of Maven locking. TIA!

cstamas avatar Sep 05 '23 12:09 cstamas

This is a very good direction.

I have witnessed locking failures on larger projects even with Redisson and Hazelcast (as mentioned in the past). These are probably timeouts in which case the plain error message I am getting Could not acquire write lock for ... is badly worded. However, it seems I missed the error detail . Could not acquire write lock for 'C:\work\repository\.locks\artifact~antlr~antlr~2.7.6.lock' in 30 SECONDS which explains the problem.

I will increase the time limit using -Daether.syncContext.named.time and get back to you.

eliasbalasis avatar Sep 05 '23 12:09 eliasbalasis

IMHO this issue here is kind of invalid as it is a bug in maven itself and not in mvnd.

I agree with this assesment, and is perfectly clear: mvnd only suffers from this bug (as it pressures the locking most), that is actually deep in resolver.

cstamas avatar Sep 05 '23 12:09 cstamas

And one more note: with latest Maven 3.9.4 you have option to collect "diagnostic data" for locking as well, and in case of those timeouts, it will spit out the diag data, see here https://maven.apache.org/resolver/maven-resolver-named-locks/index.html#diagnostic-collection-in-case-of-failures

Note: collecting diag data has impact on heap usage, so you may need to tweak it, if you hit OOM.

The diag will either:

  • help you to identify your "hot artifact" (or diss it)
  • but at least identify any possible congestions

Example output (lists named lock used, active lock count, and for each named lock the acquired steps): https://gist.github.com/cstamas/fe1bd5b73c3e02877d9647c00aa40831

Big fat note: use it ONLY with Maven 3.9.4 (actually, with resolver 1.9.14+), as before, it emitted WRONG data (see MRESOLVER-380). This means mvnd is out of scope, as latest m7 uses older resolver....

cstamas avatar Sep 05 '23 12:09 cstamas

Noted, the disgnostics work only with Maven 3.9.4 and resolver 1.9.14+

However, the timeout change didn't work: using -Daether.syncContext.named.factory=semaphore-hazelcast -Daether.syncContext.named.nameMapper=gav -Daether.syncContext.named.time=60 SmartBuilder and Maven 3.8.8-resolver-1.8.2 I am still getting the same errors:

12:53:38.601 E Internal error: java.lang.IllegalStateException: Could not acquire write lock for 'artifact:XXX' -> [Help 1]
org.apache.maven.InternalErrorException: Internal error: java.lang.IllegalStateException: Could not acquire write lock for 'artifact:XXX'
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:121)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:963)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:296)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
Caused by: java.lang.IllegalStateException: Could not acquire write lock for 'artifact:XXX'
    at org.eclipse.aether.internal.impl.synccontext.named.NamedLockFactoryAdapter$AdaptedLockSyncContext.acquire(NamedLockFactoryAdapter.java:165)
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:233)
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifact(DefaultArtifactResolver.java:212)
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveArtifact(DefaultRepositorySystem.java:272)
    at org.apache.maven.project.ProjectModelResolver.resolveModel(ProjectModelResolver.java:192)
    at org.apache.maven.project.ProjectModelResolver.resolveModel(ProjectModelResolver.java:242)
    at org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:1150)
    at org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:916)
    at org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:361)
    at org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:267)
    at org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:448)
    at org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:414)
    at org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:377)
    at org.apache.maven.graph.DefaultGraphBuilder.collectProjects(DefaultGraphBuilder.java:414)
    at org.apache.maven.graph.DefaultGraphBuilder.getProjectsForMavenReactor(DefaultGraphBuilder.java:405)
    at org.apache.maven.graph.DefaultGraphBuilder.build(DefaultGraphBuilder.java:82)
    at org.apache.maven.DefaultMaven.buildGraph(DefaultMaven.java:535)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:220)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
    ... 11 common frames omitted

Should I wait for opportunity to run with Maven 3.9.x ? or am I missing something?

eliasbalasis avatar Sep 05 '23 13:09 eliasbalasis

Sorry, all this above is true for Maven 3.9.4 and latest 1.9.14+ resolver...

Am really unsure what Maven 3.8.8-resolver-1.8.2 is.... Is it @michael-o product? https://maven.apache.org/resolver/maven-3.8.x.html If so, then resolver 1.8.x should be just avoided, and even the 1.9.7 used in there is too old (a lot of bugs were fixed since....), latest release is 1.9.15.

What stops you to move to Maven 3.9.4?

cstamas avatar Sep 05 '23 13:09 cstamas

Is it indeed @michael-o product at https://maven.apache.org/resolver/maven-3.8.x.html

Nothing stops us from switching to Maven 3.9.x, besides workload, commitments and priorities.

I have planned for this to happen later this year.

eliasbalasis avatar Sep 05 '23 13:09 eliasbalasis

@cstamas @eliasbalasis Re: Maven 3.8.x with Resolver 1.9.x: Yes, I do produce and sign those deliverables. Cherry-picked commits allowing people access new Resolver if thy can't/want/whatever upgrade. I will upgade those to 1.9.15 shortly.

michael-o avatar Sep 09 '23 07:09 michael-o

@cstamas We are using Apache Resolver 1.9.13 in a different project and started to see the locking issues as well. Can it be related to a slow file system? When was this locking introduced? Given that this is outside of mvnd should the issue be moved to the Maven Resolver project?

java.lang.IllegalStateException: Could not acquire lock(s)
        at org.eclipse.aether.internal.impl.synccontext.named.NamedLockFactoryAdapter$AdaptedLockSyncContext.acquire(NamedLockFactoryAdapter.java:219)
        at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:271)
        at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:259)
        at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifact(DefaultArtifactResolver.java:242)
        at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:231)
        at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.readArtifactDescriptor(DefaultArtifactDescriptorReader.java:172)
        at org.eclipse.aether.internal.impl.collect.df.DfDependencyCollector.resolveCachedArtifactDescriptor(DfDependencyCollector.java:382)
        at org.eclipse.aether.internal.impl.collect.df.DfDependencyCollector.getArtifactDescriptorResult(DfDependencyCollector.java:368)
        at org.eclipse.aether.internal.impl.collect.df.DfDependencyCollector.processDependency(DfDependencyCollector.java:218)
        at org.eclipse.aether.internal.impl.collect.df.DfDependencyCollector.processDependency(DfDependencyCollector.java:156)
        at org.eclipse.aether.internal.impl.collect.df.DfDependencyCollector.process(DfDependencyCollector.java:138)
        at org.eclipse.aether.internal.impl.collect.df.DfDependencyCollector.doCollectDependencies(DfDependencyCollector.java:108)
        at org.eclipse.aether.internal.impl.collect.DependencyCollectorDelegate.collectDependencies(DependencyCollectorDelegate.java:222)
        at org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.collectDependencies(DefaultDependencyCollector.java:87)
        at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:327)
        at ...
        Suppressed: java.lang.IllegalStateException: Attempt 1: Could not acquire write lock for 'artifact:junit:junit:4.13.2' in 30 SECONDS
                at org.eclipse.aether.internal.impl.synccontext.named.NamedLockFactoryAdapter$AdaptedLockSyncContext.acquire(NamedLockFactoryAdapter.java:202)
                ... 30 more
        Suppressed: java.lang.IllegalStateException: Attempt 2: Could not acquire write lock for 'artifact:junit:junit:4.13.2' in 30 SECONDS
                at org.eclipse.aether.internal.impl.synccontext.named.NamedLockFactoryAdapter$AdaptedLockSyncContext.acquire(NamedLockFactoryAdapter.java:202)
                ... 30 more

guw avatar Sep 13 '23 05:09 guw

@guw this is definitely maven-resolver issue, but cannot say more without knowing more. Please try to enable "lock diag" (set Java system property --not Maven property!) aether.named.diagnostic.enabled=true.

Edit: locking was introduced in Resolver 1.8.x but reworked/generalized in 1.9.x lineage.

cstamas avatar Oct 05 '23 07:10 cstamas

@cstamas @eliasbalasis Re: Maven 3.8.x with Resolver 1.9.x: Yes, I do produce and sign those deliverables. Cherry-picked commits allowing people access new Resolver if thy can't/want/whatever upgrade. I will upgade those to 1.9.15 shortly.

@michael-o Did you have any opportunity to publish the latest version of "Resolver" ?

@guw this is definitely maven-resolver issue, but cannot say more without knowing more. Please try to enable "lock diag" (set Java system property --not Maven property!) aether.named.diagnostic.enabled=true.

Edit: locking was introduced in Resolver 1.8.x but reworked/generalized in 1.9.x lineage.

@guw It is with Java system property that I experimented but I never got any detailed output (using https://maven.apache.org/resolver/maven-3.8.x.html)

Also as @cstamas mentioned further up:

And one more note: with latest Maven 3.9.4 you have option to collect "diagnostic data" for locking as well, and in case of those timeouts, it will spit out the diag data, see here https://maven.apache.org/resolver/maven-resolver-named-locks/index.html#diagnostic-collection-in-case-of-failures

... ...

Big fat note: use it ONLY with Maven 3.9.4 (actually, with resolver 1.9.14+), as before, it emitted WRONG data (see MRESOLVER-380). This means mvnd is out of scope, as latest m7 uses older resolver....

It seems Maven 3.9.x is the only effective way. However, I do vaguely recall having tried Maven 3.9.x at some point discovering some incompatibilities that I didn't have the time to explore, which may be implying that we are at a dead end. I have planned for an upgrade to Maven 3.9.x before the ned of this year, at which point I am expecting to stumble upon any incompatibilities and I will let you know.

This is not "mvnd" related though, without a shadow of a doubt. Should we post this somewhere else?

eliasbalasis avatar Oct 08 '23 10:10 eliasbalasis

@cstamas @eliasbalasis Re: Maven 3.8.x with Resolver 1.9.x: Yes, I do produce and sign those deliverables. Cherry-picked commits allowing people access new Resolver if thy can't/want/whatever upgrade. I will upgade those to 1.9.15 shortly.

@michael-o Did you have any opportunity to publish the latest version of "Resolver" ?

@guw this is definitely maven-resolver issue, but cannot say more without knowing more. Please try to enable "lock diag" (set Java system property --not Maven property!) aether.named.diagnostic.enabled=true. Edit: locking was introduced in Resolver 1.8.x but reworked/generalized in 1.9.x lineage.

@guw It is with Java system property that I experimented but I never got any detailed output (using https://maven.apache.org/resolver/maven-3.8.x.html)

Also as @cstamas mentioned further up:

And one more note: with latest Maven 3.9.4 you have option to collect "diagnostic data" for locking as well, and in case of those timeouts, it will spit out the diag data, see here https://maven.apache.org/resolver/maven-resolver-named-locks/index.html#diagnostic-collection-in-case-of-failures ... ... Big fat note: use it ONLY with Maven 3.9.4 (actually, with resolver 1.9.14+), as before, it emitted WRONG data (see MRESOLVER-380). This means mvnd is out of scope, as latest m7 uses older resolver....

It seems Maven 3.9.x is the only effective way. However, I do vaguely recall having tried Maven 3.9.x at some point discovering some incompatibilities that I didn't have the time to explore, which may be implying that we are at a dead end. I have planned for an upgrade to Maven 3.9.x before the ned of this year, at which point I am expecting to stumble upon any incompatibilities and I will let you know.

This is not "mvnd" related though, without a shadow of a doubt. Should we post this somewhere else?

Planning to package today.

michael-o avatar Oct 08 '23 13:10 michael-o

@eliasbalasis Distros delivered. Please check them out from the dist dev space.

michael-o avatar Oct 08 '23 17:10 michael-o

@michael-o many thanks for making the effort, it helps a lot.

I will try it and see how it goes.

eliasbalasis avatar Oct 09 '23 13:10 eliasbalasis

@eliasbalasis could you please provide us feedback why did official 3.9.x release not work for you? I think the real path should be forward, instead to stick to one-man && non-asf maven customized distribution (that as example shows above, just makes things worse, as both users and devs are confused why things does not work while they should [IF it would be official distro])...

cstamas avatar Oct 11 '23 19:10 cstamas

@cstamas I agree with moving forward but I hope you understand that such transformations often have a cost that has to be prioritized, particularly when the relevant systems are used by multiple teams on multiple projects to eliminate disruption of any operations.

That said, I do recall having tried running some of our builds on Mavel 3.9.x as an experiment in the past but I cannot remember off the top of my head the exact errors.

However, like I said, I have planned for this transformation to happen before the end of this year, as part of our yearly technical debt handling effort, at which point I will know the exact problem and I will report it, if it is still effective.

eliasbalasis avatar Oct 12 '23 07:10 eliasbalasis

As promised, I am now transforming our build systems to make use of Maven 3.9.6 and named locks for our parallel builds.

The first indications are promising, using a remote Redisson cache. However, it will take a while to observe the system under heavy load.

I will be reporting findings here.

eliasbalasis avatar Mar 21 '24 07:03 eliasbalasis

Unfortunately, after a little while the problem is still present.

Could not acquire lock(s)

The only option that seems to be working is -Daether.syncContext.named.factory=file-lock -Daether.syncContext.named.nameMapper=file-gav, which is not great but at least it hasn't reproduced the problem for a considerably long period of time.

I will try -Daether.named.diagnostic.enabled=true

eliasbalasis avatar Mar 21 '24 09:03 eliasbalasis

-Daether.named.diagnostic.enabled=true did not reveal much, except detailed list of the active locks.

However, I was using the rwlock-redisson named locks factory.

I will try with other named lock factories.

eliasbalasis avatar Mar 21 '24 10:03 eliasbalasis

semaphore-redisson named locks factory did not work either.

I will try with Hazelcast name locks factory.

eliasbalasis avatar Mar 21 '24 11:03 eliasbalasis

The point of "diagnostic" is exactly that, please paste it into gist and put it here....

cstamas avatar Mar 21 '24 12:03 cstamas

@cstamas, I cannot share the build output because it contains references to private components.

Specifically for "redisson" named lock implementation, trust me when I say that the build output did not reveal much except detailed list of the active locks. It contains only the list of locks at the time of failure and no stack trace or any detailed analysis of the errors. Therefore it is pointless for me trying to cleanup the build output by removing the references to private references.

I observed this repeatedly using "gav" , "discriminating" name mappers. (see https://maven.apache.org/resolver/maven-resolver-named-locks/index.html)

However, the "static" name mapper seems to be the only one working without producing any locking errors. This is great progress from where I stand.

I am now trying to experiment with "hazelcast" named lock implementation and I intend to share the outcome.

eliasbalasis avatar Mar 24 '24 09:03 eliasbalasis

At least you can do following as written here: https://github.com/apache/maven-mvnd/issues/836#issuecomment-1706549388

You should be able to identity your "hot artifact", if you have one.

cstamas avatar Mar 24 '24 10:03 cstamas