quarkus icon indicating copy to clipboard operation
quarkus copied to clipboard

io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest fails on MacOS M1

Open holly-cummins opened this issue 1 year ago • 9 comments

Describe the bug

Note: This test has been disabled on M1 as part of #27156

This is the error:

[INFO] Running io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest
2022-09-17 13:00:53,398 INFO  [io.qua.ely.sec.pro.dep.ElytronPropertiesProcessor] (build-12) Configuring from MPRealmConfig
2022-09-17 13:00:53,960 INFO  [io.quarkus] (main) quarkus-smallrye-graphql-deployment 999-SNAPSHOT on JVM (powered by Quarkus 999-SNAPSHOT) started in 0.336s. Listening on: http://localhost:8081
2022-09-17 13:00:53,960 INFO  [io.quarkus] (main) Profile test activated. 
2022-09-17 13:00:53,960 INFO  [io.quarkus] (main) Installed features: [cdi, hibernate-validator, jaeger, resteasy-reactive, security, security-properties-file, smallrye-context-propagation, smallrye-graphql, smallrye-metrics, smallrye-opentracing, vertx]
2022-09-17 13:00:57,265 INFO  [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-29) I/O exception (java.net.SocketException) caught when connecting to {}->http://localhost:8081: Too many open files
[zillions of repeated log warnings]
2022-09-17 13:00:57,278 INFO  [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-32) I/O exception (java.net.SocketException) caught when connecting to {}->http://localhost:8081: Too many open files
2022-09-17 13:00:57,278 INFO  [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-32) Retrying connect to {}->http://localhost:8081
2022-09-17 13:00:58,267 WARN  [io.net.cha.DefaultChannelPipeline] (vert.x-acceptor-thread-0) An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.: java.io.IOException: Too many open files
        at java.base/sun.nio.ch.Net.accept(Native Method)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.implAccept(ServerSocketChannelImpl.java:425)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:391)
        at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:119)
        at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:116)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
        at io.netty.util.internal.SocketUtils.accept(SocketUtils.java:116)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:154)
        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:79)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:487)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:833)

2022-09-17 13:00:59,268 WARN  [io.net.cha.DefaultChannelPipeline] (vert.x-acceptor-thread-0) An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.: java.io.IOException: Too many open files
        at java.base/sun.nio.ch.Net.accept(Native Method)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.implAccept(ServerSocketChannelImpl.java:425)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:391)
        at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:119)
        at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:116)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
        at io.netty.util.internal.SocketUtils.accept(SocketUtils.java:116)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:154)
        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:79)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:487)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:833)

2022-09-17 13:00:59,365 INFO  [io.quarkus] (main) quarkus-smallrye-graphql-deployment stopped in 0.087s
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.169 s <<< FAILURE! - in io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest
[ERROR] io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest.concurrentAllFilmsOnly  Time elapsed: 5.291 s  <<< ERROR!

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

Undo the disabling of the test, and then

mvn verify -f extensions/smallrye-graphql/deployment

Also can be reproduced on the Mac CI.

Output of uname -a or ver

No response

Output of java -version

No response

GraalVM version (if different from Java)

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

holly-cummins avatar Sep 17 '22 16:09 holly-cummins

/cc @Ladicek, @gastaldi, @jmartisk, @phillip-kruger, @radcortez

quarkus-bot[bot] avatar Sep 17 '22 16:09 quarkus-bot[bot]

I've seen this error happen sporadically with other tests as well.

Can you reproduce this issue when running this single test only?

gastaldi avatar Sep 17 '22 16:09 gastaldi

Yeah I wonder if we are hitting the infamous file descriptor exhaustion issue you all had with macOS.

gsmet avatar Sep 17 '22 16:09 gsmet

I didn't raise a defect until I'd reproduced locally .... but now I can't reproduce locally (even after un-disabling). So I suspect Guillaume may be right about the file descriptor issue.

holly-cummins avatar Sep 17 '22 16:09 holly-cummins

I just built the graphql extension on my M1 - no problem.

cescoffier avatar Sep 18 '22 17:09 cescoffier

I un-disabled the test, ran a build, and same thing - the build overall was a bit unhappy and hung. That complicates diagnosis, but https://github.com/quarkusio/quarkus/actions/runs/3076919532/jobs/4971564330 shows too many open files in smallrye graphql deployment. I'll add the guard again, with an extra condition to only disable things in CI. We certainly won't have less M1 coverage with that than we do now.

2022-09-18T12:25:23.8743260Z 2022-09-18 08:25:23,873 INFO  [io.quarkus] (main) Profile test activated. 
2022-09-18T12:25:23.8743750Z 2022-09-18 08:25:23,873 INFO  [io.quarkus] (main) Installed features: [cdi, hibernate-validator, jaeger, resteasy-reactive, security, security-properties-file, smallrye-context-propagation, smallrye-graphql, smallrye-metrics, smallrye-opentracing, vertx]
2022-09-18T12:25:26.4024660Z 2022-09-18 08:25:26,210 INFO  [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-3) I/O exception (java.net.SocketException) caught when connecting to {}->http://localhost:8081: Too many open files

(When I had the test disabled, I had a CI run which was clean apart from #28035. So there's clearly some intermittency, and possibly a machine-specific issue, but it seems to be specific to that test (based on the sample of 4).)

holly-cummins avatar Sep 18 '22 17:09 holly-cummins

the too many open files is just happening at that point because we reach the limit. So, it's either a leak somewhere or just life (Maven opens lots of files).

It would be interesting to run the build with a profile or collecting JFR events (there is probably an even tracking open / close files)

You may need to configure your ulimit - but that's something I generally avoid as, after that, you may have a different behavior during load tests (as you will be able to open more connections)

cescoffier avatar Sep 19 '22 06:09 cescoffier

I agree that it happens at that point because we hit the limit, but the interesting thing is that we don't hit the limit at all without that test. So that test/underlying code must be particularly hungry (or leaky) in its file consumption. The good news is it perhaps narrows down the scope of what we will need to profile, following @cescoffier's suggestions.

This issue can track the investigation to decide whether there's something we can/should change on our side, or whether the test and implementation are hungry-but-correct and we need to adjust the machine configuration.

holly-cummins avatar Sep 19 '22 08:09 holly-cummins

That is weird, I would need to track how many file descriptors are opened by this test or if we have a leak.

cescoffier avatar Sep 19 '22 11:09 cescoffier