OutOfMemoryException: Metaspace in quarkus-smallrye-openapi-deployment tests
Describe the bug
As others have noted elsewhere, there is occasional instability with the tests/CI that originates from the quarkus-smallrye-openapi-deployment module. Based on some investigation, growth in metaspace use appears to be a mix of (1) heavy use of QuarkusUnitTest objects and (2) inherent metaspace usage of rest-assured to evaluate JsonPath expressions when testing REST responses.
For (1), quarkus-smallrye-openapi-deployment currently creates about 47 QuarkusUnitTest. Other extensions create more than that (e.g. tls-registry has 129 or so), but it seems that combining with (2) the metaspace limit is more quickly reached.
I'm opening this issue to work out how this can be resolved.
I've done some experimentation using Jayway JsonPath which does not cause metaspace use to increase from json-path evaluation, but it does mean a new managed dependency. Another (possible acceptable) downside is that using this library requires side-stepping rest-assured's native JsonPath features. Note - the Strimzi OAuth library has a direct dependency on Jayway JsonPath, but it is not yet managed in the Quarkus BOM.
Besides that, is there any optimization that can be done with QuarkusUnitTest to avoid reloading a large number of application classes that are common for each test case?
Expected behavior
Tests complete without OOME metaspace errors
Actual behavior
quarkus-smallrye-openapi-deployment tests may fail with OOME metaspace errors. Even when not failing on main, new PRs that add test cases will likely hit the OOME without some changes to how these tests work.
How to Reproduce?
No response
Output of uname -a or ver
No response
Output of java -version
No response
Quarkus version or git rev
No response
Build tool (ie. output of mvnw --version or gradlew --version)
No response
Additional information
No response
/cc @EricWittmann (openapi), @Ladicek (smallrye), @jmartisk (smallrye), @phillip-kruger (openapi,smallrye), @radcortez (smallrye)
It might be we just need to increase how much memory we allocate in the build config.
Or we could investigate to check if any classloader leaks have re-crept-in because of the test classloading rewrite, but that's harder work, and we do expect the 'high tide' of in-memory classes to be higher than it was before that work. The QuarkusUnitTest tests shouldn't be directly affected, but there could be some indirect effects.
Yes, I can do some more experimentation with test metaspace allocations.
What is your thinking on the issue with rest-assured potentially leaking metaspace memory from json-path evaluations? It seems as though this could impact any single surefire execution making heavy use of that functionality and I'm guessing it's the bigger impact in the OpenAPI module versus the QuarkusUnitTest stuff.
@MikeEdgar when using Jayway JsonPath, you still use REST Assured or you drop it entirely? Would you have an example of the resulting code?
I'm not sure what would make you think we have a mestaspace leak? :)
I'm having a look at what's going on, not sure it's something we can fix though.
@gsmet I have a PR open [1] in rest-assured that seems to make a significant difference in the metaspace issue. Regarding Jayway, I was still using RestAssured for making HTTP requests and making simpler assertions about headers, etc., but extracting the result and using Jayway for path assertions about a JSON response. Based on the testing tried so far with the rest-assured PR though, I don't think that will be necessary anymore - TBD.
quarkus-smallrye-openapi-deployment tests with rest-assured 5.5.6:
quarkus-smallrye-openapi-deployment tests with rest-assured PR :
[1] https://github.com/rest-assured/rest-assured/pull/1844
OK, that looks promising. Hopefully they will consider the patch.
If I were you, I would copy/paste these results in the REST Assured PR as they are pretty self-explanatory.
Very nice @MikeEdgar!
Groovy has caused us pain the past - we have a couple hacks in the test codebase but it's clearly not enough
From what I have seen, we have traded one leak for another. At least, for the one disabling ClassValue.
My guess is that we can deal with those easier than we can with Groovy code
While looking at the re-base of https://github.com/rest-assured/rest-assured/pull/1844 and why Groovy 5 causes new leaks, I've found there are several locations in Groovy that start-up "cleaner" threads [1]. Forcing the Groovy artifacts to use parent-first class loading seems to fix it, but I question whether how I'm doing it is correct. The local test simply adds these to the QuarkusBootstrap.Builder in QuarkusUnitTest.
.addParentFirstArtifact(ArtifactKey.of("org.apache.groovy", "groovy", null, "jar"))
.addParentFirstArtifact(ArtifactKey.of("org.apache.groovy", "groovy-json", null, "jar"))
.addParentFirstArtifact(ArtifactKey.of("org.apache.groovy", "groovy-xml", null, "jar"))
Is there a better way to do this (and do you agree parent-first is the right thing for Groovy)?
[1] https://github.com/apache/groovy/commit/0bd13c36649874f7bf4b2c7e98eeef4b2ae55e8d
@MikeEdgar so yeah, I actually considered it when I was experimenting with my previous fix.
I think we need to be consistent between testing and runtime though or you could have all sorts of weird issues. That means we would need to add them to core/runtime/pom.xml with the others.
But what might be a bit of a problem is we probably need to add the whole Groovy stack there and I'm not sure about their dependencies. Now we could start small with just these ones and see how it goes and if people complain.
I propose we just add all the groovy dependencies as parent first in core/runtime/pom.xm
Also, I think we should propose a patch to Groovy for the long term: we need to have a way to stop these threads. It's not a really a good idea to start threads and let them live forever. Once this patch is integrated, we can reconsider if keeping Groovy parent first is the right solution or not.
Ideally we would have a unique entry point to properly shut down all Groovy caches/threads/....
we can reconsider if keeping Groovy parent first is the right solution or not.
Considering how much it has caused us when loading it over and over again, I think it makes sense to do this regardless
Is the loading/reloading an issue outside of the QuarkusUnitTest use case?
I don't think many people use Groovy in Quarkus, so I have no idea
I think we should create a PR with this and then ask the people at https://github.com/quarkiverse/quarkus-groovy for feedback.
I think we should create a PR with this and then ask the people at https://github.com/quarkiverse/quarkus-groovy for feedback.
Is anyone planning on this? If not, I'll open the PR for the 3 Groovy artifacts mentioned above (as a starting point).
@MikeEdgar given you're evaluating the situation with the new REST Assured version, go for it so that you can evaluate if it fixes the issue or not.
Adding the three Groovy artifacts in core/runtime/pom.xml definitely helps with the thread leaking issue. It does not work with rest-assured 5.5.6 on Quarkus main because it breaks the internal ThreadLocal in rest-assured, together with the code the removes the value after each test. However, there is no thread leakage using the changes from my rest-assured PR. I don't think there is much more to do here until the PR is accepted and released.