ValidateLayerConsistencyTest is flaky
Apache NetBeans version
Apache NetBeans latest daily build
What happened
The ValidateLayerConsistencyTest seems to be flaky:
https://github.com/apache/netbeans/blob/2bdbb637653cf19179963a3150add8f99d8b6977/platform/o.n.core/test/qa-functional/src/org/netbeans/core/validation/ValidateLayerConsistencyTest.java
Sometimes fails in an assertion:
2022-06-08T18:30:24.0613903Z [junit] FINE [org.openide.util.lookup.implspi.ActiveQueue]: Got dequeued reference org.openide.util.lookup.AbstractLookup$ReferenceToResult@597289e4
2022-06-08T18:30:24.0614562Z [junit] FINE [org.openide.util.lookup.implspi.ActiveQueue]: Got dequeued reference org.openide.util.lookup.AbstractLookup$ReferenceToResult@67cb9a30
2022-06-08T18:30:27.1650822Z [junit] java.lang.AssertionError: Has to be NbRepository: org.openide.filesystems.Repository@5cf749fa Initialization stack: java.lang.Exception: initialized
2022-06-08T18:30:27.1651908Z [junit] at org.openide.filesystems.Repository.<init>(Repository.java:326)
2022-06-08T18:30:27.1652502Z [junit] at org.openide.filesystems.Repository$3.call(Repository.java:373)
2022-06-08T18:30:27.1652987Z [junit] at org.openide.filesystems.Repository$3.call(Repository.java:368)
2022-06-08T18:30:27.1653709Z [junit] at org.openide.filesystems.Repository.delayFileSystemAttachImpl(Repository.java:1018)
2022-06-08T18:30:27.1654297Z [junit] at org.openide.filesystems.Repository.getDefault(Repository.java:368)
2022-06-08T18:30:27.1654794Z [junit] at org.netbeans.core.startup.Main.start(Main.java:298)
2022-06-08T18:30:27.1655764Z [junit] at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:98)
2022-06-08T18:30:27.1656203Z [junit] at java.lang.Thread.run(Thread.java:750)
2022-06-08T18:30:27.1656523Z [junit]
2022-06-08T18:30:27.1656885Z [junit] at org.netbeans.core.startup.Main.start(Main.java:298)
2022-06-08T18:30:27.1657532Z [junit] at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:98)
2022-06-08T18:30:27.1657948Z [junit] at java.lang.Thread.run(Thread.java:750)
The full log of the run is zipped: fulllog.txt.zip (The quoted part start in line 109186).
The listed stack trace was retrieved using an idea from @jtulach, by creating a fake exception in the Repository constructor, fetching the stacktrace from it and storing it into a public field of that class.
This seems to be a race when initializing the META-INF based services. The NbRepository that is expected, is registered as such a service and thus should be available from the default lookup.
This is the code in master (at time of writing):
https://github.com/apache/netbeans/blob/66e010eb26d449be68b3f0c860d1b5d1e8c375fd/platform/openide.filesystems/src/org/openide/filesystems/Repository.java#L348-L376
and the wrong Repository is created in line 362.
Did this work correctly in an earlier version?
No
Operating System
Linux
JDK
JDK 8 - 19 all show behavior sometimes
Apache NetBeans packaging
Other
WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.Repository is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@7a7b0070 (from SystemClassLoader[807 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@7ac7a4e4
While the race condition is under investigation, perhaps the test could detect the corruption and skip itself. Not sure how, given that it seems to have crashed the JVM rather than failing properly.
If I read the sequence correctly, the error happens in the startup sequenze of the tested NetBeans instance. The error roots in an assert in that sequenze. We can't change that to a JUnit assumption.
What is more, the error seems to only manifest on the CI/CD infrastructure, but not on my laptop.
Hi, this failed today in #4327 .
Quoting from the build log before it's deleted we can read (1)
2022-07-03T20:40:22.2367088Z [junit] WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.Repository is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@77e9807f (from SystemClassLoader[820 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@1a677343
2022-07-03T20:40:22.2368142Z [junit] WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.spi.ArchiveRootProvider is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@77e9807f (from SystemClassLoader[820 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@1a677343
Question is, if org.openide.filesystems.Repository is not "the real McCoy", how is the synchronized(Repository.class) block that @matthiasblaesing quoted above expected to behave?
I mean, if Repository.class is not "the real McCoy" then the synchronized(Repository.class) must be synchronizing in a "fake McCoy" and not in the "real McCoy", right?
BTW, who is McCoy?
Cheers, Antonio
(1)
2022-07-03T20:40:22.2342797Z [junit] java.lang.AssertionError: Has to be NbRepository: org.openide.filesystems.Repository@51b9c375
2022-07-03T20:40:22.2353885Z [junit] at org.netbeans.core.startup.Main.start(Main.java:298)
2022-07-03T20:40:22.2354312Z [junit] at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:98)
2022-07-03T20:40:22.2354723Z [junit] at java.base/java.lang.Thread.run(Thread.java:829)
2022-07-03T20:40:22.2355358Z [junit] -------------------------------------------------------------------------------
2022-07-03T20:40:22.2355740Z [junit] >Log Session: Sunday, July 3, 2022 at 8:40:21 PM Coordinated Universal Time
2022-07-03T20:40:22.2356013Z [junit] >System Info:
2022-07-03T20:40:22.2356487Z [junit] Product Version = Apache NetBeans Platform Dev (Build dev-53a6ce54c0cf1cae704977185237f5125c3aa361)
2022-07-03T20:40:22.2356976Z [junit] Operating System = Linux version 5.13.0-1031-azure running on amd64
2022-07-03T20:40:22.2357445Z [junit] Java; VM; Vendor = 11.0.15; OpenJDK 64-Bit Server VM 11.0.15+10-LTS; Azul Systems, Inc.
2022-07-03T20:40:22.2357895Z [junit] Runtime = OpenJDK Runtime Environment 11.0.15+10-LTS
2022-07-03T20:40:22.2358317Z [junit] Java Home = /opt/hostedtoolcache/Java_Zulu_jdk/11.0.15-10/x64
2022-07-03T20:40:22.2358666Z [junit] System Locale; Encoding = en (nb); UTF-8
2022-07-03T20:40:22.2358932Z [junit] Home Directory = /home/runner
2022-07-03T20:40:22.2359232Z [junit] Current Directory = /home/runner/work/netbeans/netbeans
2022-07-03T20:40:22.2359744Z [junit] User Directory = /home/runner/work/netbeans/netbeans/platform/o.n.core/build/test/qa-functional/work/userdir1
2022-07-03T20:40:22.2360337Z [junit] Cache Directory = /home/runner/work/netbeans/netbeans/platform/o.n.core/build/test/qa-functional/work/userdir1/var/cache
2022-07-03T20:40:22.2360747Z [junit] Installation = /home/runner/work/netbeans/netbeans/nbbuild/netbeans/ide
2022-07-03T20:40:22.2361112Z [junit] /home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform
2022-07-03T20:40:22.2361512Z [junit] Boot & Ext. Classpath =
2022-07-03T20:40:22.2362980Z [junit] Application Classpath = loaded by URLClassLoader file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/org-openide-util-ui.jar file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/boot.jar file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/org-openide-util.jar file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/org-openide-util-lookup.jar file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/org-openide-modules.jar file:/opt/hostedtoolcache/Java_Zulu_jdk/11.0.15-10/x64/lib/jrt-fs.jar
2022-07-03T20:40:22.2365166Z [junit] Startup Classpath = /home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/core-base.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/org-openide-filesystems-compat8.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/asm-9.3.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/org-netbeans-libs-asm.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/org-openide-filesystems.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/asm-commons-9.3.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/asm-tree-9.3.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/core.jar
2022-07-03T20:40:22.2366364Z [junit] -------------------------------------------------------------------------------
2022-07-03T20:40:22.2367088Z [junit] WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.Repository is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@77e9807f (from SystemClassLoader[820 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@1a677343
2022-07-03T20:40:22.2368142Z [junit] WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.spi.ArchiveRootProvider is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@77e9807f (from SystemClassLoader[820 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@1a677343
2022-07-03T20:40:22.5039755Z [junit] Testsuite: org.netbeans.core.validation.ValidateLayerConsistencyTest
2022-07-03T20:40:22.5108340Z [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec
2022-07-03T20:40:22.5109036Z [junit]
2022-07-03T20:40:22.5109530Z [junit] Testcase: org.netbeans.core.validation.ValidateLayerConsistencyTest:testActionInstancesOnlyInActionsFolder: Caused an ERROR
2022-07-03T20:40:22.5110096Z [junit] Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit.
2022-07-03T20:40:22.5110632Z [junit] junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit.
2022-07-03T20:40:22.5111165Z [junit] at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5113034Z [junit] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5113714Z [junit] at java.base/java.util.Vector.forEach(Vector.java:1388)
2022-07-03T20:40:22.5114263Z [junit] at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5114848Z [junit] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5115435Z [junit] at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5117187Z [junit] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5117766Z [junit] at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5119554Z [junit] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5120205Z [junit] at java.base/java.util.Vector.forEach(Vector.java:1388)
2022-07-03T20:40:22.5120739Z [junit] at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5121244Z [junit] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5121926Z [junit] at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5122469Z [junit] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5122965Z [junit] at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5123470Z [junit] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5123843Z [junit]
2022-07-03T20:40:22.5124022Z [junit]
2022-07-03T20:40:22.5554527Z [junit] Test org.netbeans.core.validation.ValidateLayerConsistencyTest FAILED (crashed)
2022-07-03T20:40:22.7809459Z [junit] WARNING: package com.apple.eio not in java.desktop
2022-07-03T20:40:22.8727524Z [junit] Testsuite: org.netbeans.core.validation.ValidateModulesTest
2022-07-03T20:40:23.4814479Z [junit] System class loader patches: /home/runner/work/netbeans/netbeans/platform/o.n.core/build/test/qa-functional/classes
Ah, I now see @jglick already posted about McCoy before...
When creating the new repository instance (i.e. when null) should that code be synchronized as well?
https://en.wikipedia.org/wiki/The_real_McCoy FYI
@vieiro
I mean, if
Repository.classis not "the real McCoy" then thesynchronized(Repository.class)must be synchronizing in a "fake McCoy" and not in the "real McCoy", right?
synchronized(Repository.class) means, that the synchronisation happens on the class Repository, not on a single instance of it. The problem happens in the code path, that returns the default repository and in there is a race condition. I suspect, that there is some Lookup code involved, that does some postprocessing of the results and delays the return, so that the default Repository is created.
@ebresie
When creating the new repository instance (i.e. when null) should that code be synchronized as well?
See my answer to @vieiro you can create as many repositories as you like. The problem is that only one Repository is used as the system wide repository and that needs to be the right one.
@matthiasblaesing
synchronized(Repository.class) means, that the synchronisation happens on the class Repository,
Well, of course. I was wondering if the class Repository may be coming from different ClassLoaders at different times (first for "the real McCoy" and later "for the fake McCoy", or viceversa). If so then that could be the race condition we're looking for, right? If this is possible then maybe we could replace the synchronized(Repository.class) , such as this: https://github.com/apache/netbeans/compare/master...vieiro:netbeans:hotfix/4211
c0313ec5f0a9f29bffd00f906ed791f7eb5187bd does not look like it would change anything.
Anyway I doubt the issue is about the synchronized statements. lkp.lookup(Repository.class) is returning null because there are two BootClassLoaders in play, somehow, and thus two loaded copies of platform/core/org-openide-filesystems.jar.
Normal behavior for the test would be for the class loader to be created like this
at org.netbeans.MainImpl$BootClassLoader.<init>(MainImpl.java:248)
at org.netbeans.MainImpl.execute(MainImpl.java:154)
at org.netbeans.MainImpl.main(MainImpl.java:60)
at org.netbeans.Main.main(Main.java:58)
at …
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.netbeans.junit.NbModuleSuite$S.runInRuntimeContainer(NbModuleSuite.java:905)
at org.netbeans.junit.NbModuleSuite$S.access$100(NbModuleSuite.java:660)
at org.netbeans.junit.NbModuleSuite$S$1.protect(NbModuleSuite.java:681)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at org.netbeans.junit.NbModuleSuite$S.run(NbModuleSuite.java:677)
at …
and the NbRepository to be created like this
at org.openide.filesystems.Repository.<init>(Repository.java:319)
at org.netbeans.core.startup.NbRepository.<init>(NbRepository.java:54)
at …
at java.base/java.lang.Class.newInstance(Class.java:645)
at org.openide.util.lookup.implspi.SharedClassObjectBridge.newInstance(SharedClassObjectBridge.java:41)
at org.openide.util.lookup.MetaInfServicesLookup$Item.getInstance(MetaInfServicesLookup.java:489)
at org.openide.util.lookup.AbstractLookup.lookup(AbstractLookup.java:397)
at org.openide.util.lookup.ProxyLookup.lookup(ProxyLookup.java:281)
at org.openide.filesystems.Repository$3.call(Repository.java:361)
at org.openide.filesystems.Repository$3.call(Repository.java:359)
at org.openide.filesystems.Repository.delayFileSystemAttachImpl(Repository.java:1009)
at org.openide.filesystems.Repository.getDefault(Repository.java:359)
at org.netbeans.core.startup.Main.start(Main.java:281)
at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:98)
at java.base/java.lang.Thread.run(Thread.java:833)
I suspect the problem somehow pertains to JUnit or at least NbModuleSuite: that the failing test run is actually the second time NetBeans has been loaded in the JVM. The first time (possibly from a different test case), the default Lookup got an NbRepository as usual and cached it. The second time, the same Lookup class was used; when the new BootClassLoader was created, a new copy of Repository was loaded, and Repository.getDefault was called, the old NbRepository was found but the “real McCoy” defense detected that this was not actually assignable to the Repository class being queried, so it bailed out. Of course this is all not supposed to be possible, because each test run is supposed to load Lookup and everything else from scratch: https://github.com/apache/netbeans/blob/a1b803a832cb0c1161b4c522737452ea4d8d6737/harness/nbjunit/src/org/netbeans/junit/NbModuleSuite.java#L810-L814
You can see from the build log that Log Session is printed twice for ValidateLayerConsistencyTest, implying that TopThreadGroup is being run twice.
Is some sort of additional tear down logic needed in some context (after test or before test)to prevent or something to setup this correctly?