quarkus
quarkus copied to clipboard
io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest fails on MacOS M1
Describe the bug
Note: This test has been disabled on M1 as part of #27156
This is the error:
[INFO] Running io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest
2022-09-17 13:00:53,398 INFO [io.qua.ely.sec.pro.dep.ElytronPropertiesProcessor] (build-12) Configuring from MPRealmConfig
2022-09-17 13:00:53,960 INFO [io.quarkus] (main) quarkus-smallrye-graphql-deployment 999-SNAPSHOT on JVM (powered by Quarkus 999-SNAPSHOT) started in 0.336s. Listening on: http://localhost:8081
2022-09-17 13:00:53,960 INFO [io.quarkus] (main) Profile test activated.
2022-09-17 13:00:53,960 INFO [io.quarkus] (main) Installed features: [cdi, hibernate-validator, jaeger, resteasy-reactive, security, security-properties-file, smallrye-context-propagation, smallrye-graphql, smallrye-metrics, smallrye-opentracing, vertx]
2022-09-17 13:00:57,265 INFO [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-29) I/O exception (java.net.SocketException) caught when connecting to {}->http://localhost:8081: Too many open files
[zillions of repeated log warnings]
2022-09-17 13:00:57,278 INFO [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-32) I/O exception (java.net.SocketException) caught when connecting to {}->http://localhost:8081: Too many open files
2022-09-17 13:00:57,278 INFO [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-32) Retrying connect to {}->http://localhost:8081
2022-09-17 13:00:58,267 WARN [io.net.cha.DefaultChannelPipeline] (vert.x-acceptor-thread-0) An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.: java.io.IOException: Too many open files
at java.base/sun.nio.ch.Net.accept(Native Method)
at java.base/sun.nio.ch.ServerSocketChannelImpl.implAccept(ServerSocketChannelImpl.java:425)
at java.base/sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:391)
at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:119)
at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:116)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
at io.netty.util.internal.SocketUtils.accept(SocketUtils.java:116)
at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:154)
at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:79)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:487)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
2022-09-17 13:00:59,268 WARN [io.net.cha.DefaultChannelPipeline] (vert.x-acceptor-thread-0) An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.: java.io.IOException: Too many open files
at java.base/sun.nio.ch.Net.accept(Native Method)
at java.base/sun.nio.ch.ServerSocketChannelImpl.implAccept(ServerSocketChannelImpl.java:425)
at java.base/sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:391)
at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:119)
at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:116)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
at io.netty.util.internal.SocketUtils.accept(SocketUtils.java:116)
at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:154)
at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:79)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:487)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
2022-09-17 13:00:59,365 INFO [io.quarkus] (main) quarkus-smallrye-graphql-deployment stopped in 0.087s
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.169 s <<< FAILURE! - in io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest
[ERROR] io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest.concurrentAllFilmsOnly Time elapsed: 5.291 s <<< ERROR!
Expected behavior
No response
Actual behavior
No response
How to Reproduce?
Undo the disabling of the test, and then
mvn verify -f extensions/smallrye-graphql/deployment
Also can be reproduced on the Mac CI.
Output of uname -a
or ver
No response
Output of java -version
No response
GraalVM version (if different from Java)
No response
Quarkus version or git rev
No response
Build tool (ie. output of mvnw --version
or gradlew --version
)
No response
Additional information
No response
/cc @Ladicek, @gastaldi, @jmartisk, @phillip-kruger, @radcortez
I've seen this error happen sporadically with other tests as well.
Can you reproduce this issue when running this single test only?
Yeah I wonder if we are hitting the infamous file descriptor exhaustion issue you all had with macOS.
I didn't raise a defect until I'd reproduced locally .... but now I can't reproduce locally (even after un-disabling). So I suspect Guillaume may be right about the file descriptor issue.
I just built the graphql extension on my M1 - no problem.
I un-disabled the test, ran a build, and same thing - the build overall was a bit unhappy and hung. That complicates diagnosis, but https://github.com/quarkusio/quarkus/actions/runs/3076919532/jobs/4971564330 shows too many open files
in smallrye graphql deployment. I'll add the guard again, with an extra condition to only disable things in CI. We certainly won't have less M1 coverage with that than we do now.
2022-09-18T12:25:23.8743260Z 2022-09-18 08:25:23,873 INFO [io.quarkus] (main) Profile test activated.
2022-09-18T12:25:23.8743750Z 2022-09-18 08:25:23,873 INFO [io.quarkus] (main) Installed features: [cdi, hibernate-validator, jaeger, resteasy-reactive, security, security-properties-file, smallrye-context-propagation, smallrye-graphql, smallrye-metrics, smallrye-opentracing, vertx]
2022-09-18T12:25:26.4024660Z 2022-09-18 08:25:26,210 INFO [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-3) I/O exception (java.net.SocketException) caught when connecting to {}->http://localhost:8081: Too many open files
(When I had the test disabled, I had a CI run which was clean apart from #28035. So there's clearly some intermittency, and possibly a machine-specific issue, but it seems to be specific to that test (based on the sample of 4).)
the too many open files
is just happening at that point because we reach the limit.
So, it's either a leak somewhere or just life (Maven opens lots of files).
It would be interesting to run the build with a profile or collecting JFR events (there is probably an even tracking open / close files)
You may need to configure your ulimit
- but that's something I generally avoid as, after that, you may have a different behavior during load tests (as you will be able to open more connections)
I agree that it happens at that point because we hit the limit, but the interesting thing is that we don't hit the limit at all without that test. So that test/underlying code must be particularly hungry (or leaky) in its file consumption. The good news is it perhaps narrows down the scope of what we will need to profile, following @cescoffier's suggestions.
This issue can track the investigation to decide whether there's something we can/should change on our side, or whether the test and implementation are hungry-but-correct and we need to adjust the machine configuration.
That is weird, I would need to track how many file descriptors are opened by this test or if we have a leak.