quarkus icon indicating copy to clipboard operation
quarkus copied to clipboard

OutOfMemoryException: Metaspace in quarkus-smallrye-openapi-deployment tests

Open MikeEdgar opened this issue 2 months ago • 20 comments

Describe the bug

As others have noted elsewhere, there is occasional instability with the tests/CI that originates from the quarkus-smallrye-openapi-deployment module. Based on some investigation, growth in metaspace use appears to be a mix of (1) heavy use of QuarkusUnitTest objects and (2) inherent metaspace usage of rest-assured to evaluate JsonPath expressions when testing REST responses.

For (1), quarkus-smallrye-openapi-deployment currently creates about 47 QuarkusUnitTest. Other extensions create more than that (e.g. tls-registry has 129 or so), but it seems that combining with (2) the metaspace limit is more quickly reached.

I'm opening this issue to work out how this can be resolved.

I've done some experimentation using Jayway JsonPath which does not cause metaspace use to increase from json-path evaluation, but it does mean a new managed dependency. Another (possible acceptable) downside is that using this library requires side-stepping rest-assured's native JsonPath features. Note - the Strimzi OAuth library has a direct dependency on Jayway JsonPath, but it is not yet managed in the Quarkus BOM.

Besides that, is there any optimization that can be done with QuarkusUnitTest to avoid reloading a large number of application classes that are common for each test case?

Expected behavior

Tests complete without OOME metaspace errors

Actual behavior

quarkus-smallrye-openapi-deployment tests may fail with OOME metaspace errors. Even when not failing on main, new PRs that add test cases will likely hit the OOME without some changes to how these tests work.

How to Reproduce?

No response

Output of uname -a or ver

No response

Output of java -version

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

MikeEdgar avatar Oct 14 '25 17:10 MikeEdgar

/cc @EricWittmann (openapi), @Ladicek (smallrye), @jmartisk (smallrye), @phillip-kruger (openapi,smallrye), @radcortez (smallrye)

quarkus-bot[bot] avatar Oct 14 '25 17:10 quarkus-bot[bot]

It might be we just need to increase how much memory we allocate in the build config.

Or we could investigate to check if any classloader leaks have re-crept-in because of the test classloading rewrite, but that's harder work, and we do expect the 'high tide' of in-memory classes to be higher than it was before that work. The QuarkusUnitTest tests shouldn't be directly affected, but there could be some indirect effects.

holly-cummins avatar Oct 15 '25 11:10 holly-cummins

Yes, I can do some more experimentation with test metaspace allocations.

What is your thinking on the issue with rest-assured potentially leaking metaspace memory from json-path evaluations? It seems as though this could impact any single surefire execution making heavy use of that functionality and I'm guessing it's the bigger impact in the OpenAPI module versus the QuarkusUnitTest stuff.

MikeEdgar avatar Oct 15 '25 12:10 MikeEdgar

@MikeEdgar when using Jayway JsonPath, you still use REST Assured or you drop it entirely? Would you have an example of the resulting code?

gsmet avatar Oct 20 '25 08:10 gsmet

I'm not sure what would make you think we have a mestaspace leak? :)

Image

I'm having a look at what's going on, not sure it's something we can fix though.

gsmet avatar Oct 20 '25 09:10 gsmet

@gsmet I have a PR open [1] in rest-assured that seems to make a significant difference in the metaspace issue. Regarding Jayway, I was still using RestAssured for making HTTP requests and making simpler assertions about headers, etc., but extracting the result and using Jayway for path assertions about a JSON response. Based on the testing tried so far with the rest-assured PR though, I don't think that will be necessary anymore - TBD.

quarkus-smallrye-openapi-deployment tests with rest-assured 5.5.6:

Image

quarkus-smallrye-openapi-deployment tests with rest-assured PR :

Image

[1] https://github.com/rest-assured/rest-assured/pull/1844

MikeEdgar avatar Oct 20 '25 11:10 MikeEdgar

OK, that looks promising. Hopefully they will consider the patch.

If I were you, I would copy/paste these results in the REST Assured PR as they are pretty self-explanatory.

gsmet avatar Oct 20 '25 12:10 gsmet

Very nice @MikeEdgar!

Groovy has caused us pain the past - we have a couple hacks in the test codebase but it's clearly not enough

geoand avatar Oct 20 '25 15:10 geoand

From what I have seen, we have traded one leak for another. At least, for the one disabling ClassValue.

gsmet avatar Oct 20 '25 16:10 gsmet

My guess is that we can deal with those easier than we can with Groovy code

geoand avatar Oct 20 '25 16:10 geoand

While looking at the re-base of https://github.com/rest-assured/rest-assured/pull/1844 and why Groovy 5 causes new leaks, I've found there are several locations in Groovy that start-up "cleaner" threads [1]. Forcing the Groovy artifacts to use parent-first class loading seems to fix it, but I question whether how I'm doing it is correct. The local test simply adds these to the QuarkusBootstrap.Builder in QuarkusUnitTest.

.addParentFirstArtifact(ArtifactKey.of("org.apache.groovy", "groovy", null, "jar"))
.addParentFirstArtifact(ArtifactKey.of("org.apache.groovy", "groovy-json", null, "jar"))
.addParentFirstArtifact(ArtifactKey.of("org.apache.groovy", "groovy-xml", null, "jar"))

Is there a better way to do this (and do you agree parent-first is the right thing for Groovy)?

[1] https://github.com/apache/groovy/commit/0bd13c36649874f7bf4b2c7e98eeef4b2ae55e8d

MikeEdgar avatar Dec 06 '25 12:12 MikeEdgar

@MikeEdgar so yeah, I actually considered it when I was experimenting with my previous fix.

I think we need to be consistent between testing and runtime though or you could have all sorts of weird issues. That means we would need to add them to core/runtime/pom.xml with the others.

But what might be a bit of a problem is we probably need to add the whole Groovy stack there and I'm not sure about their dependencies. Now we could start small with just these ones and see how it goes and if people complain.

gsmet avatar Dec 06 '25 12:12 gsmet

I propose we just add all the groovy dependencies as parent first in core/runtime/pom.xm

geoand avatar Dec 06 '25 12:12 geoand

Also, I think we should propose a patch to Groovy for the long term: we need to have a way to stop these threads. It's not a really a good idea to start threads and let them live forever. Once this patch is integrated, we can reconsider if keeping Groovy parent first is the right solution or not.

Ideally we would have a unique entry point to properly shut down all Groovy caches/threads/....

gsmet avatar Dec 06 '25 12:12 gsmet

we can reconsider if keeping Groovy parent first is the right solution or not.

Considering how much it has caused us when loading it over and over again, I think it makes sense to do this regardless

geoand avatar Dec 06 '25 12:12 geoand

Is the loading/reloading an issue outside of the QuarkusUnitTest use case?

MikeEdgar avatar Dec 06 '25 12:12 MikeEdgar

I don't think many people use Groovy in Quarkus, so I have no idea

geoand avatar Dec 06 '25 12:12 geoand

I think we should create a PR with this and then ask the people at https://github.com/quarkiverse/quarkus-groovy for feedback.

gsmet avatar Dec 06 '25 12:12 gsmet

I think we should create a PR with this and then ask the people at https://github.com/quarkiverse/quarkus-groovy for feedback.

Is anyone planning on this? If not, I'll open the PR for the 3 Groovy artifacts mentioned above (as a starting point).

MikeEdgar avatar Dec 10 '25 14:12 MikeEdgar

@MikeEdgar given you're evaluating the situation with the new REST Assured version, go for it so that you can evaluate if it fixes the issue or not.

gsmet avatar Dec 10 '25 14:12 gsmet

Adding the three Groovy artifacts in core/runtime/pom.xml definitely helps with the thread leaking issue. It does not work with rest-assured 5.5.6 on Quarkus main because it breaks the internal ThreadLocal in rest-assured, together with the code the removes the value after each test. However, there is no thread leakage using the changes from my rest-assured PR. I don't think there is much more to do here until the PR is accepted and released.

MikeEdgar avatar Dec 10 '25 19:12 MikeEdgar