glassfish icon indicating copy to clipboard operation
glassfish copied to clipboard

Concurrency TCK failures with latest glassfish nightly build

Open gurunrao opened this issue 2 years ago • 33 comments

Following Concurrency TCK failures with latest Glassfish 7 nightly Web Profile build:

  • ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests.testCopyCompletableFutureEJB

CI run - https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-concurrency-tck-glassfish-run/95/

gurunrao avatar Aug 01 '22 07:08 gurunrao

With commit 71a6150119dcfc0aeafcfd11aa480c9da408ce56 it passed locally. Can you rerun the test?

I am looking at the AfterTypeDiscoveryMassOperationsTest now ... The test changed recently ... https://github.com/jakartaee/cdi-tck/pull/401/files ... on the line 128 is not an assertEquals now, but previous version has an assertTrue there.

https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-cdi-tck-glassfish-run-cert/org.glassfish$glassfish.cdi-tck/19/testReport/junit/org.jboss.cdi.tck.tests.full.extensions.lifecycle.atd/AfterTypeDiscoveryTest/testFinalAlternatives_null__1_/

===============================================
CDI TCK
Total tests run: 1831, Passes: 1831, Failures: 0, Skips: 0
===============================================

dmatej avatar Aug 01 '22 09:08 dmatej

Seems the build fetches an old CDI TCK: [INFO] Downloading from jakarta-snapshots: https://jakarta.oss.sonatype.org/content/repositories/staging/jakarta/enterprise/cdi-tck-api/4.0.5/cdi-tck-api-4.0.5.pom see here

Those errors were fixed in CDI TCK 4.0.7, which was released after the PR stated by @dmatej.

erdlet avatar Aug 01 '22 09:08 erdlet

Thanks @erdlet, @dmatej. I have updated TCK runner to use 4.0.7 - https://github.com/eclipse-ee4j/jakartaee-tck/pull/1104/files

gurunrao avatar Aug 01 '22 10:08 gurunrao

@erdlet @scottmarlow - should the above CDI TCK test update be sent to Specification Committee for approval, since this is an change in behavior of the tests? As per current process, test behavior is not allowed to change post review, tests can be excluded based on challenge. CC @dmatej @arjantijms

gurunrao avatar Aug 01 '22 10:08 gurunrao

I'm no expert in CDI TCK, so as far as I understood was this change only to improve the test assumption, not the expected result.

erdlet avatar Aug 01 '22 10:08 erdlet

It seems to me that the test was just stabilized to allow to use also other extensions while previous expected just three which was wrong assumption. It is ok now.

dmatej avatar Aug 01 '22 11:08 dmatej

Following Concurrency TCK failures with latest Glassfish 7 nightly Web Profile build:

* ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests.testCopyCompletableFutureEJB

CI run - https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-concurrency-tck-glassfish-run/95/

Some Concurrency TCK problems with ManagedExecutorDefinitionWebTests were mentioned started in comment https://github.com/jakartaee/concurrency/pull/260#pullrequestreview-1055907581 that seem to continue up to comment https://github.com/jakartaee/concurrency/pull/260#issuecomment-1200112668

scottmarlow avatar Aug 01 '22 13:08 scottmarlow

Does anyone know why GlassFish is failing the Concurrency test on Web Profile yet?

scottmarlow avatar Aug 03 '22 15:08 scottmarlow

Should GlassFish use https://repo1.maven.org/maven2/jakarta/enterprise/concurrent/jakarta.enterprise.concurrent-api/3.0.1/jakarta.enterprise.concurrent-api-3.0.1.jar or https://repo1.maven.org/maven2/jakarta/enterprise/concurrent/jakarta.enterprise.concurrent-api/3.0.0/jakarta.enterprise.concurrent-api-3.0.0.jar in its Web Profile testing? We don't think it would make a difference but asking just the same.

scottmarlow avatar Aug 03 '22 15:08 scottmarlow

Does anyone know why GlassFish is failing the Concurrency test on Web Profile yet?

A new test was added that for some reason fails. It fails on both the full profile and the web profile.

arjantijms avatar Aug 08 '22 11:08 arjantijms

Does anyone know why GlassFish is failing the Concurrency test on Web Profile yet?

A new test was added that for some reason fails. It fails on both the full profile and the web profile.

According to https://github.com/eclipse-ee4j/jakartaee-tck/wiki/Jakarta-EE-10.0-TCK-results, the Concurrency TCK test only fails on Web Profile but not Full Platform mode.

So, the only thing that we know is the ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests.testCopyCompletableFutureEJB is failing for unknown reasons.

We are also testing against the Staged Concurrency TCK https://download.eclipse.org/ee4j/cu/jakartaee10/staged/eftl/concurrency-tck-3.0.2.zip last built on 2022-07-30.

Once thing that we haven't done is create a Concurrency TCK issue (or challenge if we learn of something to challenge). Any volunteers to create a https://github.com/jakartaee/concurrency/issues for this failure? Beyond the test name, I think we also need to include the test output, some of which I pasted into https://gist.github.com/scottmarlow/045911483dfca81173ce82afcf04a362 (including some of tests before and after in case that is relevant).

I think that there has been discussions with the Concurrency team but I would like a tracking Concurrency issue so we can get more direct feedback on what the test failure means happened exactly. It would be helpful to have an explanation of what the test code at https://github.com/jakartaee/concurrency/blob/master/tck/src/main/java/ee/jakarta/tck/concurrent/spec/ManagedExecutorService/resourcedef/ManagedExecutorDefinitionOnEJBServlet.java#L144 is validating exactly which might be a clue as to what needs to change (in the test or GlassFish).

scottmarlow avatar Aug 08 '22 19:08 scottmarlow

A problem of the same test class is discussed below and suggested to be fixed. Isn't this a related problem?

https://github.com/jakartaee/concurrency/issues/263

hs536 avatar Aug 09 '22 00:08 hs536

@hs536 good question, do you know if the https://github.com/jakartaee/concurrency/pull/264 fix would help GlassFish pass the Concurrency TCK on Web Profile?

scottmarlow avatar Aug 09 '22 13:08 scottmarlow

@arjantijms Can you validate the latest Concurrency TCK with https://github.com/jakartaee/concurrency/pull/264 ?

starksm64 avatar Aug 09 '22 15:08 starksm64

The current CCRs are already out of date, so this can be merged:

current:
shasum -a 256 /tmp/jakarta-jakartaeetck-10.0.0.zip 
cfe4db9cf5f2bf5cfff568377f7a799c3dc4624e882585a0c659d781764183dd  /tmp/jakarta-jakartaeetck-10.0.0.zip

current CCR sha256: 62a9cbbf46315ef4b5487fdbdf6e628d272fa3ace9148c5716ba57ea4415df19

starksm64 avatar Aug 09 '22 15:08 starksm64

I started the Platform TCK build via https://ci.eclipse.org/jakartaee-tck/job/10/job/eftl-jakartaeetck-build-100/82/ for jakartaee-tck/pull/1109

scottmarlow avatar Aug 09 '22 16:08 scottmarlow

GlassFish 7 Web Profile still fails with the updated staged https://download.eclipse.org/ee4j/cu/jakartaee10/staged/eftl/concurrency-tck-3.0.2.zip TCK via https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-concurrency-tck-glassfish-run/102 which failed the ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests.testCopyCompletableFutureEJB test.

FYI, the ^ test run was done with https://download.eclipse.org/ee4j/cu/jakartaee10/staged/eftl/concurrency-tck-3.0.2.zip that has sha256sum 22728d729f620d6a85ae903e7d1184e0a7508a4328491b785f1b4f3d7215ca93.

Contents of concurrency-tck-3.0.2.info:

22728d729f620d6a85ae903e7d1184e0a7508a4328491b785f1b4f3d7215ca93  concurrency-tck-3.0.2-dist.zip
-rw-r--r--. 1 jenkins 1001550000 423031 Aug  9 15:56 concurrency-tck-3.0.2-dist.zip

scottmarlow avatar Aug 09 '22 16:08 scottmarlow

https://ci.eclipse.org/jakartaee-tck/job/10/job/jakarta-concurrency-tck-glassfish-run/102/artifact/glassfish-runner/concurrency-tck/target/glassfish7/glassfish/domains/domain1/logs/server.log contains an interesting warning on a related test:

2022-08-09T16:28:44.055266Z] [GlassFish 7.0] [WARNING] [] [ee.jakarta.tck.concurrent.framework.TestServlet] [tid: _ThreadID=28 _ThreadName=http-listener-1(1)] [levelValue: 900] [[
  Caught exception attempting to call test method testCopyCompletableFutureEJB on servlet ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionOnEJBServlet
java.lang.AssertionError: StringContext must be propagated and Application context and IntContext must be left unchanged per ManagedExecutorDefinition and ContextServiceDefinition config. expected [StringContext propagated;IntContext unchanged] but found [StringContext propagated;IntContext incorrect:271]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotEquals(Assert.java:1037)
	at org.testng.Assert.assertEqualsImpl(Assert.java:140)
	at org.testng.Assert.assertEquals(Assert.java:122)
	at org.testng.Assert.assertEquals(Assert.java:629)
	at ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionOnEJBServlet.testCopyCompletableFutureEJB(ManagedExecutorDefinitionOnEJBServlet.java:144)

Could the ^ warning leave GlassFish in a bad state?

Also @jamezp noticed the following warning which would be good to understand better:

[2022-08-09T16:28:44.053280Z] [GlassFish 7.0] [SEVERE] [] [jakarta.enterprise.concurrent] [tid: _ThreadID=389 _ThreadName=java:module/concurrent/ExecutorB-managedThreadFactory-Thread-1] [levelValue: 1000] [[
  Thread context provider 'IntContext' is not registered in WEB-APP/services/jakarta.enterprise.concurrent.spi.ThreadContextProvider and will be ignored!]]

scottmarlow avatar Aug 09 '22 16:08 scottmarlow

My gut tells me the failure has something to do with that error log. The IntContext is what the test is using for checking the assertion. Having it not registered seems like a possible issue.

jamezp avatar Aug 09 '22 17:08 jamezp

Hmm, I'll try to grep for the error.

scottmarlow avatar Aug 09 '22 17:08 scottmarlow

GlassFish class appserver/extras/embedded/all/target/classes/org/glassfish/concurrent/runtime/ContextSetupProviderImpl.class seems to contain the "Thread context provider.*is not registered in" message, but I am not sure where GlassFish is getting the org.glassfish.concurrent.runtime.ContextSetupProviderImpl class from, that likely needs an update.

scottmarlow avatar Aug 09 '22 17:08 scottmarlow

@arjan - A challenge for test based on glassfish logs is needed? or fix at glassfish is the final solution? CC @scottmarlow Please note: fix https://github.com/jakartaee/concurrency/pull/264 is not helping GF to pass CU TCK.

gurunrao avatar Aug 10 '22 17:08 gurunrao

I am not sure where GlassFish is getting the org.glassfish.concurrent.runtime.ContextSetupProviderImpl class from, that likely needs an update.

https://github.com/eclipse-ee4j/glassfish/blob/master/appserver/concurrent/concurrent-impl/src/main/java/org/glassfish/concurrent/runtime/ContextSetupProviderImpl.java

The question is what update ... there is also some commented out code ... https://github.com/eclipse-ee4j/glassfish/blob/master/appserver/concurrent/concurrent-impl/src/main/java/org/glassfish/concurrent/runtime/ContextSetupProviderImpl.java#L199

dmatej avatar Aug 10 '22 22:08 dmatej

I ran the TCK locally and here are my findings:

  • When I only run the test method testCopyCompletableFutureEJB of ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests, the test passes
  • When I run the full Concurrency Web TCK, or when I run all the test methods in the ManagedExecutorDefinitionWebTests class, the test fails

The test also fails when I only execute the test method testCopyCompletableFuture before it.

Therefore GlassFish is "polluted" by the test testCopyCompletableFuture, which causes the test testCopyCompletableFutureEJB to fail.

OndroMih avatar Aug 11 '22 07:08 OndroMih

I ran the TCK locally and here are my findings:

* When I only run the test method `testCopyCompletableFutureEJB` of `ee.jakarta.tck.concurrent.spec.ManagedExecutorService.resourcedef.ManagedExecutorDefinitionWebTests`, the test passes

* When I run the full Concurrency Web TCK, or when I run all the test methods in the `ManagedExecutorDefinitionWebTests` class, the test fails

The test also fails when I only execute the test method testCopyCompletableFuture before it.

Therefore GlassFish is "polluted" by the test testCopyCompletableFuture, which causes the test testCopyCompletableFutureEJB to fail.

So, GlassFish could pass all of the tests if testCopyCompletableFutureEJB is run separately. I do not know of any requirements for tests to be run all at the same time. With the Platform TCK, if all tests were run at once, it would take multiple days to run all of the tests, instead we run smaller sets of tests. Also, I think the GlassFish Platform TCK Porting kit retries tests on the first failure, which I think would be the same.

scottmarlow avatar Aug 11 '22 12:08 scottmarlow

According to https://github.com/eclipse-ee4j/jakartaee-tck/wiki/Jakarta-EE-10.0-TCK-results, the Concurrency TCK test only fails on Web Profile but not Full Platform mode.

I think that the Full Platform mode simply excludes that test. If you alter the test file to not exclude anything for Full Platform mode, when running against GlassFish full platform, all the new web profile test also run against the full profile glassfish. This includes the failing web profile test, and it fails on glassfish full profile too.

If you normally run the full profile TCK against full profile glassfish it excludes the web profile tests, and then it passes.

So that indicates that very specific test fails on both GlassFish web profile, and GlassFish full profile, even though the web profile and full profile TCK runs don't show that.

arjantijms avatar Aug 11 '22 19:08 arjantijms

As per https://github.com/eclipse-ee4j/jakartaee-tck/pull/1111#issuecomment-1214155260, this is now solved via the https://github.com/eclipse-ee4j/jakartaee-tck/pull/1111 change.

In summary, the workaround is done following what was observed in comment https://github.com/eclipse-ee4j/glassfish/issues/24059#issuecomment-1211640262 above. We exclude the failing web test and run that test separately.

scottmarlow avatar Aug 13 '22 13:08 scottmarlow

According to https://github.com/eclipse-ee4j/jakartaee-tck/wiki/Jakarta-EE-10.0-TCK-results, the Concurrency TCK test only fails on Web Profile but not Full Platform mode.

I think that the Full Platform mode simply excludes that test. If you alter the test file to not exclude anything for Full Platform mode, when running against GlassFish full platform, all the new web profile test also run against the full profile glassfish. This includes the failing web profile test, and it fails on glassfish full profile too.

If you normally run the full profile TCK against full profile glassfish it excludes the web profile tests, and then it passes.

So that indicates that very specific test fails on both GlassFish web profile, and GlassFish full profile, even though the web profile and full profile TCK runs don't show that.

Is this mentioned in the Concurrency TCK doc requirements? I'm not against running both profiles for Full Platform.

scottmarlow avatar Aug 13 '22 13:08 scottmarlow

Although it's possible to pass the TCK if the failing test is executed in a separate test run, it seems there's some profound error in GlassFish.

The test fails only if executed after the testCopyCompletableFuture test, which essentially tests the same behavior but in servlet context. When I created a copy of the testCopyCompletableFutureEJB and executed after the testCopyCompletableFutureEJB, it again failed. This basically means that when the test testCopyCompletableFutureEJB is executed twice, it always fails the second time. The same happens if I switch the order of tests and execute testCopyCompletableFutureEJB and then execute testCopyCompletableFuture test - the EJB one passes and the servlet one (the latter) fails this time.

I still need to investigate why the second run always fails (whether it's in servlet context or EJB context seems to not matter) but we should really fix it in GlassFish. However, this issue probably shouldn't block a CCR because GF can pass the Web TCK without breaking any rules.

OndroMih avatar Aug 13 '22 22:08 OndroMih

After some investigation and debugging, I see several issues in GF:

  • ContextSetupProviderImpl.java in omnifaces/omniconcurrent copies a reference to one of the contexts to the allRemaining variable, and later modifies the allRemianing variable when a context is saved for asynchronous execution. At this point, also one of the contexts is modified, which means that the configuration is mangled and the sets of contexts don't correspond to the original setup. In this particular case, the IntContext is removed from contextUnchanged and therefore it's propagated because "remaining" contexts are set to propagate. I think a fix is to change the asssignments to allRemaining from allRemaining = contextPropagate and other contexts to allRemaining = new HashSet(contextPropagate) copy the sets instead.
  • Even after the fix I suggest in the first point, the list of ThreadContextProvider services is empty when the context is saved from an asynchronous thread. This is because in a new thread used to execute a completableFuture handler, Utility.getClassLoader() is null, pointing to the JDK platform classlaoder, and this classloader doesn't see the definitions in the deployed application. Nothing sets the thread context classloader for the completableFuture threads to the application's WebAppClassloader. This means that the custom contexts are invalid in subsequent handlers and are ignored.
    • A fix to set up the correct classloader isn't simple. We could store the current context classloader in the thread object and set it in the thread's run method before the task is executed. But the catch is that, currently, the thread is being created with the JDK platform classloader as the context classloader in ConcurrentRuntime.java#L473. This is because of a classloader leak issue reported in Payara here and fixed here. Maybe we could store the app classloader in the thread object in a weak reference to avoid any leaks but this needs a thorough investigation.
    • Alternatively, ContextSetupProviderImpl shouldn't load ThreadContextProvider services each time the saveContext is called. It could be executed once if null or even eagerly in constructor, there's no point in loading them each time, potentially with a different classloader, or am I wrong?

OndroMih avatar Aug 14 '22 08:08 OndroMih