glassfish
glassfish copied to clipboard
Concurrency TCK failures with latest glassfish nightly build
Following Concurrency TCK failures with latest Glassfish 7 nightly Web Profile build:
- ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests.testCopyCompletableFutureEJB
CI run - https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-concurrency-tck-glassfish-run/95/
With commit 71a6150119dcfc0aeafcfd11aa480c9da408ce56 it passed locally. Can you rerun the test?
I am looking at the AfterTypeDiscoveryMassOperationsTest now ... The test changed recently ... https://github.com/jakartaee/cdi-tck/pull/401/files ... on the line 128 is not an assertEquals now, but previous version has an assertTrue there.
https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-cdi-tck-glassfish-run-cert/org.glassfish$glassfish.cdi-tck/19/testReport/junit/org.jboss.cdi.tck.tests.full.extensions.lifecycle.atd/AfterTypeDiscoveryTest/testFinalAlternatives_null__1_/
===============================================
CDI TCK
Total tests run: 1831, Passes: 1831, Failures: 0, Skips: 0
===============================================
Seems the build fetches an old CDI TCK:
[INFO] Downloading from jakarta-snapshots: https://jakarta.oss.sonatype.org/content/repositories/staging/jakarta/enterprise/cdi-tck-api/4.0.5/cdi-tck-api-4.0.5.pom
see here
Those errors were fixed in CDI TCK 4.0.7, which was released after the PR stated by @dmatej.
Thanks @erdlet, @dmatej. I have updated TCK runner to use 4.0.7 - https://github.com/eclipse-ee4j/jakartaee-tck/pull/1104/files
@erdlet @scottmarlow - should the above CDI TCK test update be sent to Specification Committee for approval, since this is an change in behavior of the tests? As per current process, test behavior is not allowed to change post review, tests can be excluded based on challenge. CC @dmatej @arjantijms
I'm no expert in CDI TCK, so as far as I understood was this change only to improve the test assumption, not the expected result.
It seems to me that the test was just stabilized to allow to use also other extensions while previous expected just three which was wrong assumption. It is ok now.
Following Concurrency TCK failures with latest Glassfish 7 nightly Web Profile build:
* ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests.testCopyCompletableFutureEJB
CI run - https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-concurrency-tck-glassfish-run/95/
Some Concurrency TCK problems with ManagedExecutorDefinitionWebTests were mentioned started in comment https://github.com/jakartaee/concurrency/pull/260#pullrequestreview-1055907581 that seem to continue up to comment https://github.com/jakartaee/concurrency/pull/260#issuecomment-1200112668
Does anyone know why GlassFish is failing the Concurrency test on Web Profile yet?
Should GlassFish use https://repo1.maven.org/maven2/jakarta/enterprise/concurrent/jakarta.enterprise.concurrent-api/3.0.1/jakarta.enterprise.concurrent-api-3.0.1.jar or https://repo1.maven.org/maven2/jakarta/enterprise/concurrent/jakarta.enterprise.concurrent-api/3.0.0/jakarta.enterprise.concurrent-api-3.0.0.jar in its Web Profile testing? We don't think it would make a difference but asking just the same.
Does anyone know why GlassFish is failing the Concurrency test on Web Profile yet?
A new test was added that for some reason fails. It fails on both the full profile and the web profile.
Does anyone know why GlassFish is failing the Concurrency test on Web Profile yet?
A new test was added that for some reason fails. It fails on both the full profile and the web profile.
According to https://github.com/eclipse-ee4j/jakartaee-tck/wiki/Jakarta-EE-10.0-TCK-results, the Concurrency TCK test only fails on Web Profile but not Full Platform mode.
So, the only thing that we know is the ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests.testCopyCompletableFutureEJB
is failing for unknown reasons.
We are also testing against the Staged Concurrency TCK https://download.eclipse.org/ee4j/cu/jakartaee10/staged/eftl/concurrency-tck-3.0.2.zip last built on 2022-07-30
.
Once thing that we haven't done is create a Concurrency TCK issue (or challenge if we learn of something to challenge). Any volunteers to create a https://github.com/jakartaee/concurrency/issues for this failure? Beyond the test name, I think we also need to include the test output, some of which I pasted into https://gist.github.com/scottmarlow/045911483dfca81173ce82afcf04a362 (including some of tests before and after in case that is relevant).
I think that there has been discussions with the Concurrency team but I would like a tracking Concurrency issue so we can get more direct feedback on what the test failure means happened exactly. It would be helpful to have an explanation of what the test code at https://github.com/jakartaee/concurrency/blob/master/tck/src/main/java/ee/jakarta/tck/concurrent/spec/ManagedExecutorService/resourcedef/ManagedExecutorDefinitionOnEJBServlet.java#L144 is validating exactly which might be a clue as to what needs to change (in the test or GlassFish).
A problem of the same test class is discussed below and suggested to be fixed. Isn't this a related problem?
https://github.com/jakartaee/concurrency/issues/263
@hs536 good question, do you know if the https://github.com/jakartaee/concurrency/pull/264 fix would help GlassFish pass the Concurrency TCK on Web Profile?
@arjantijms Can you validate the latest Concurrency TCK with https://github.com/jakartaee/concurrency/pull/264 ?
The current CCRs are already out of date, so this can be merged:
current:
shasum -a 256 /tmp/jakarta-jakartaeetck-10.0.0.zip
cfe4db9cf5f2bf5cfff568377f7a799c3dc4624e882585a0c659d781764183dd /tmp/jakarta-jakartaeetck-10.0.0.zip
current CCR sha256: 62a9cbbf46315ef4b5487fdbdf6e628d272fa3ace9148c5716ba57ea4415df19
I started the Platform TCK build via https://ci.eclipse.org/jakartaee-tck/job/10/job/eftl-jakartaeetck-build-100/82/ for jakartaee-tck/pull/1109
GlassFish 7 Web Profile still fails with the updated staged https://download.eclipse.org/ee4j/cu/jakartaee10/staged/eftl/concurrency-tck-3.0.2.zip
TCK via https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-concurrency-tck-glassfish-run/102 which failed the ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests.testCopyCompletableFutureEJB
test.
FYI, the ^ test run was done with https://download.eclipse.org/ee4j/cu/jakartaee10/staged/eftl/concurrency-tck-3.0.2.zip that has sha256sum 22728d729f620d6a85ae903e7d1184e0a7508a4328491b785f1b4f3d7215ca93
.
Contents of concurrency-tck-3.0.2.info:
22728d729f620d6a85ae903e7d1184e0a7508a4328491b785f1b4f3d7215ca93 concurrency-tck-3.0.2-dist.zip
-rw-r--r--. 1 jenkins 1001550000 423031 Aug 9 15:56 concurrency-tck-3.0.2-dist.zip
https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-concurrency-tck-glassfish-run/102/artifact/glassfish-runner/concurrency-tck/target/glassfish7/glassfish/domains/domain1/logs/server.log contains an interesting warning on a related test:
2022-08-09T16:28:44.055266Z] [GlassFish 7.0] [WARNING] [] [ee.jakarta.tck.concurrent.framework.TestServlet] [tid: _ThreadID=28 _ThreadName=http-listener-1(1)] [levelValue: 900] [[
Caught exception attempting to call test method testCopyCompletableFutureEJB on servlet ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionOnEJBServlet
java.lang.AssertionError: StringContext must be propagated and Application context and IntContext must be left unchanged per ManagedExecutorDefinition and ContextServiceDefinition config. expected [StringContext propagated;IntContext unchanged] but found [StringContext propagated;IntContext incorrect:271]
at org.testng.Assert.fail(Assert.java:99)
at org.testng.Assert.failNotEquals(Assert.java:1037)
at org.testng.Assert.assertEqualsImpl(Assert.java:140)
at org.testng.Assert.assertEquals(Assert.java:122)
at org.testng.Assert.assertEquals(Assert.java:629)
at ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionOnEJBServlet.testCopyCompletableFutureEJB(ManagedExecutorDefinitionOnEJBServlet.java:144)
Could the ^ warning leave GlassFish in a bad state?
Also @jamezp noticed the following warning which would be good to understand better:
[2022-08-09T16:28:44.053280Z] [GlassFish 7.0] [SEVERE] [] [jakarta.enterprise.concurrent] [tid: _ThreadID=389 _ThreadName=java:module/concurrent/ExecutorB-managedThreadFactory-Thread-1] [levelValue: 1000] [[
Thread context provider 'IntContext' is not registered in WEB-APP/services/jakarta.enterprise.concurrent.spi.ThreadContextProvider and will be ignored!]]
My gut tells me the failure has something to do with that error log. The IntContext
is what the test is using for checking the assertion. Having it not registered seems like a possible issue.
Hmm, I'll try to grep for the error.
GlassFish class appserver/extras/embedded/all/target/classes/org/glassfish/concurrent/runtime/ContextSetupProviderImpl.class seems to contain the "Thread context provider.*is not registered in"
message, but I am not sure where GlassFish is getting the org.glassfish.concurrent.runtime.ContextSetupProviderImpl
class from, that likely needs an update.
@arjan - A challenge for test based on glassfish logs is needed? or fix at glassfish is the final solution? CC @scottmarlow Please note: fix https://github.com/jakartaee/concurrency/pull/264 is not helping GF to pass CU TCK.
I am not sure where GlassFish is getting the
org.glassfish.concurrent.runtime.ContextSetupProviderImpl
class from, that likely needs an update.
https://github.com/eclipse-ee4j/glassfish/blob/master/appserver/concurrent/concurrent-impl/src/main/java/org/glassfish/concurrent/runtime/ContextSetupProviderImpl.java
The question is what update ... there is also some commented out code ... https://github.com/eclipse-ee4j/glassfish/blob/master/appserver/concurrent/concurrent-impl/src/main/java/org/glassfish/concurrent/runtime/ContextSetupProviderImpl.java#L199
I ran the TCK locally and here are my findings:
- When I only run the test method
testCopyCompletableFutureEJB
ofee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests
, the test passes - When I run the full Concurrency Web TCK, or when I run all the test methods in the
ManagedExecutorDefinitionWebTests
class, the test fails
The test also fails when I only execute the test method testCopyCompletableFuture
before it.
Therefore GlassFish is "polluted" by the test testCopyCompletableFuture
, which causes the test testCopyCompletableFutureEJB
to fail.
I ran the TCK locally and here are my findings:
* When I only run the test method `testCopyCompletableFutureEJB` of `ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests`, the test passes * When I run the full Concurrency Web TCK, or when I run all the test methods in the `ManagedExecutorDefinitionWebTests` class, the test fails
The test also fails when I only execute the test method
testCopyCompletableFuture
before it.Therefore GlassFish is "polluted" by the test
testCopyCompletableFuture
, which causes the testtestCopyCompletableFutureEJB
to fail.
So, GlassFish could pass all of the tests if testCopyCompletableFutureEJB
is run separately. I do not know of any requirements for tests to be run all at the same time. With the Platform TCK, if all tests were run at once, it would take multiple days to run all of the tests, instead we run smaller sets of tests. Also, I think the GlassFish Platform TCK Porting kit retries tests on the first failure, which I think would be the same.
According to https://github.com/eclipse-ee4j/jakartaee-tck/wiki/Jakarta-EE-10.0-TCK-results, the Concurrency TCK test only fails on Web Profile but not Full Platform mode.
I think that the Full Platform mode simply excludes that test. If you alter the test file to not exclude anything for Full Platform mode, when running against GlassFish full platform, all the new web profile test also run against the full profile glassfish. This includes the failing web profile test, and it fails on glassfish full profile too.
If you normally run the full profile TCK against full profile glassfish it excludes the web profile tests, and then it passes.
So that indicates that very specific test fails on both GlassFish web profile, and GlassFish full profile, even though the web profile and full profile TCK runs don't show that.
As per https://github.com/eclipse-ee4j/jakartaee-tck/pull/1111#issuecomment-1214155260, this is now solved via the https://github.com/eclipse-ee4j/jakartaee-tck/pull/1111 change.
In summary, the workaround is done following what was observed in comment https://github.com/eclipse-ee4j/glassfish/issues/24059#issuecomment-1211640262 above. We exclude the failing web
test and run that test separately.
According to https://github.com/eclipse-ee4j/jakartaee-tck/wiki/Jakarta-EE-10.0-TCK-results, the Concurrency TCK test only fails on Web Profile but not Full Platform mode.
I think that the Full Platform mode simply excludes that test. If you alter the test file to not exclude anything for Full Platform mode, when running against GlassFish full platform, all the new web profile test also run against the full profile glassfish. This includes the failing web profile test, and it fails on glassfish full profile too.
If you normally run the full profile TCK against full profile glassfish it excludes the web profile tests, and then it passes.
So that indicates that very specific test fails on both GlassFish web profile, and GlassFish full profile, even though the web profile and full profile TCK runs don't show that.
Is this mentioned in the Concurrency TCK doc requirements? I'm not against running both profiles for Full Platform.
Although it's possible to pass the TCK if the failing test is executed in a separate test run, it seems there's some profound error in GlassFish.
The test fails only if executed after the testCopyCompletableFuture
test, which essentially tests the same behavior but in servlet context. When I created a copy of the testCopyCompletableFutureEJB
and executed after the testCopyCompletableFutureEJB
, it again failed. This basically means that when the test testCopyCompletableFutureEJB
is executed twice, it always fails the second time. The same happens if I switch the order of tests and execute testCopyCompletableFutureEJB
and then execute testCopyCompletableFuture
test - the EJB one passes and the servlet one (the latter) fails this time.
I still need to investigate why the second run always fails (whether it's in servlet context or EJB context seems to not matter) but we should really fix it in GlassFish. However, this issue probably shouldn't block a CCR because GF can pass the Web TCK without breaking any rules.
After some investigation and debugging, I see several issues in GF:
-
ContextSetupProviderImpl.java in
omnifaces/omniconcurrent
copies a reference to one of the contexts to theallRemaining
variable, and later modifies the allRemianing variable when a context is saved for asynchronous execution. At this point, also one of the contexts is modified, which means that the configuration is mangled and the sets of contexts don't correspond to the original setup. In this particular case, theIntContext
is removed fromcontextUnchanged
and therefore it's propagated because "remaining" contexts are set to propagate. I think a fix is to change the asssignments toallRemaining
fromallRemaining = contextPropagate
and other contexts toallRemaining = new HashSet(contextPropagate)
copy the sets instead. - Even after the fix I suggest in the first point, the list of ThreadContextProvider services is empty when the context is saved from an asynchronous thread. This is because in a new thread used to execute a completableFuture handler,
Utility.getClassLoader()
isnull
, pointing to the JDK platform classlaoder, and this classloader doesn't see the definitions in the deployed application. Nothing sets the thread context classloader for the completableFuture threads to the application'sWebAppClassloader
. This means that the custom contexts are invalid in subsequent handlers and are ignored.- A fix to set up the correct classloader isn't simple. We could store the current context classloader in the thread object and set it in the thread's run method before the task is executed. But the catch is that, currently, the thread is being created with the JDK platform classloader as the context classloader in ConcurrentRuntime.java#L473. This is because of a classloader leak issue reported in Payara here and fixed here. Maybe we could store the app classloader in the thread object in a weak reference to avoid any leaks but this needs a thorough investigation.
- Alternatively, ContextSetupProviderImpl shouldn't load
ThreadContextProvider
services each time thesaveContext
is called. It could be executed once ifnull
or even eagerly in constructor, there's no point in loading them each time, potentially with a different classloader, or am I wrong?