quarkus icon indicating copy to clipboard operation
quarkus copied to clipboard

No server chosen by com.mongodb.reactivestreams: Failed looking up SRV record

Open charalampos-micholias opened this issue 1 year ago • 10 comments

Describe the bug

Issue using Quarkus 2.15.0 version

When trying a connection attempt in a mongo shard through a Quarkus application version >= 2.15.0 below error is propagated:

" _2023-11-24 10:07:53,815 INFO [com.vod.dxl.gr.sys.uti.SRVQuery] (vert.x-eventloop-thread-0) 0 0 27017 dxl-gr-preprod-shard-00-02-nuler.mongodb.net. 2023-11-24 10:07:53,907 INFO [com.vod.dxl.gr.sys.uti.SRVQuery] (vert.x-eventloop-thread-0) 0 0 27017 dxl-gr-preprod-shard-00-00-nuler.mongodb.net. 2023-11-24 10:07:53,908 INFO [com.vod.dxl.gr.sys.uti.SRVQuery] (vert.x-eventloop-thread-0) 0 0 27017 dxl-gr-preprod-shard-00-01-nuler.mongodb.net. 2023-11-24 10:07:56,108 INFO [org.mon.dri.cluster] (executor-thread-1) No server chosen by com.mongodb.reactivestreams.client.internal.ClientSessionHelper$$Lambda$441/0x0000000050de1ef0@11222f0c from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out 2023-11-24 10:07:59,508 INFO [org.mon.dri.cluster] (cluster-656075d114c7f278b4f504d7) No server chosen by WritableServerSelector from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb.tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out "

Issue using Quarkus 3.1.0 version

As described on #32991 this issue is resolved in "Quarkus 3.0.2.Final" version

Based on the aforementioned proposal, upon upgrading on Quarkus "3.1.0.Final" a new error (resulting to same mongo connection inability) takes place.

Below you may find exact logs: " 07:24:02 INFO traceId=afd520fa41b8e64ab71320ef860625cb, parentId=, spanId=914390a5279baf32, sampled=true [or.mo.dr.cluster] (vert.x-eventloop-thread-0) No server chosen by com.mongodb.reactivestreams.client.internal.ClientSessionHelper$$Lambda$771/0x0000000064382208@64e7bf16 from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=localhost:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}]}. Waiting for 30000 ms before timing out "

Dedicated Script for further logs: In attempt to have further view on Mongo SRV attributes we have created a dedicated script - triggered each time mongo data retrieval- is attempted that provides below logs: " _07:03:36 INFO traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [gr.vo.dx.su.se.SRVQuery] (executor-thread-1) Providing SRV attributes logs.... 07:03:36 INFO traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [gr.vo.dx.su.se.SRVQuery] (executor-thread-1) 0 0 27017 dxl-gr-preprod-shard-00-01-nuler.mongodb.net. 07:03:36 INFO traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [gr.vo.dx.su.se.SRVQuery] (executor-thread-1) 0 0 27017 dxl-gr-preprod-shard-00-02-nuler.mongodb.net. 07:03:36 INFO traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [gr.vo.dx.su.se.SRVQuery] (executor-thread-1) 0 0 27017 dxl-gr-preprod-shard-00-00-nuler.mongodb.net.

07:24:08 INFO traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [or.mo.dr.cluster] (executor-thread-3) No server chosen by com.mongodb.reactivestreams.client.internal.ClientSessionHelper$$Lambda$771/0x0000000064382208@53b297d6 from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=dxl-gr-preprod-shard-00-02-nuler.mongodb.net:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=dxl-gr-preprod-shard-00-01-nuler.mongodb.net:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=dxl-gr-preprod-shard-00-00.nuler.mongodb.net:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out_ "

Expected behavior

Quarkus should connect to Mongo Atlas without this SRV lookup or MongoSocketOpenException error

Actual behavior

Connection failure takes place when reaching the established time thresshold "Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}]}. Waiting for 30000 ms before timing out"

How to Reproduce?

Using the free tier mongo atlas account,

  • use the above mentioned quarkus versions (2.15.0 and 3.1.0 )
  • specify the mongo atlas cluster url in application.properties.
  • Then, write a simple rest service that when invoked connects to mongo to retrieve records.

Output of uname -a or ver

MINGW64_NT-10.0-22621 E0624331 3.3.3-341.x86_64 2022-01-17 11:45 UTC x86_64 Msys

Output of java -version

java version "17.0.7" 2023-04-18 LTS

Quarkus version or git rev

2.15.0 and 3.1.0

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

Dedicated script getMongoDebugLogs() :

public class SRVQuery {

private static final Logger log = LoggerFactory.getLogger(SRVQuery.class);
public static void getMongoDebugLogs() {
    try {
        // Set up the environment for creating the initial context
        Hashtable<String, String> env = new Hashtable<>();
        env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.dns.DnsContextFactory");
        env.put(Context.PROVIDER_URL, "dns:");
        DirContext ctx = new InitialDirContext(env);
        // Specify the name of the domain to query
        String domainName = "_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net";
        // Perform the SRV lookup
        Attributes attrs = ctx.getAttributes(domainName, new String[]{"SRV"});
        Attribute attr = attrs.get("SRV");
        log.info("Providing SRV attributes logs....");
        if (attr != null) {
            for (int i = 0; i < attr.size(); i++) {
                log.info((String) attr.get(i));
            }
        } else {
            log.info("No SRV records found for " + domainName);
        }
        ctx.close();
    } catch (NamingException e) {
        log.info("Handing mongo debugging exception..");
        e.printStackTrace();
        log.info(e.getMessage());
    }
}

}

charalampos-micholias avatar Dec 08 '23 13:12 charalampos-micholias

/cc @evanchooly (mongodb), @loicmathieu (mongodb)

quarkus-bot[bot] avatar Dec 08 '23 13:12 quarkus-bot[bot]

Hi, Can you try with the latest Quarkus version, it should be 3.6.1. If it still fails, can you provide a reproducer project?

loicmathieu avatar Dec 08 '23 14:12 loicmathieu

Hello, good afternoon, After several further tests we have identified the exact pattern that this issue occurs. More specific issue occurs on the below combination:

  • Quarkus version>= 2.15.0 (including Quarkus 3.1.0) and
  • Mongo connection through connection string formatted as : mongodb+srv://USERNAME:[email protected]

keep as important note -as a tested workaround- that issue does not appear upon combining Quarkus 3.1.0 and connection string containing exact mongo shards formatted as: mongodb://USERNAME:PASSWORD@XXXXX-shard-00-00.nuler.mongodb.net:27017,XXXXX-shard-00-01-nuler.mongodb.net:27017/

So after many thorough test seems that even Quarkus 3.1.0 version faces inability to establish connection when a SRV mongo connection string is provided.

Exact logs for aforementioned combination:

15:09:53 INFO  traceId=, parentId=, spanId=, sampled= [or.mo.dr.cluster] (cluster-6578779b5cf3934a85277080) No server chosen by WritableServerSelector from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out

15:09:56 INFO  traceId=0caf882656bffbdbd9cae0ca6fbfeb3f, parentId=, spanId=c9077c52e865326b, sampled=false [or.mo.dr.cluster] (executor-thread-2) No server chosen by com.mongodb.reactivestreams.client.internal.ClientSessionHelper$$Lambda$790/0x00000000903f70f8@4180c2b from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out

15:09:57 INFO  traceId=, parentId=, spanId=, sampled= [or.mo.dr.cluster] (cluster-6578779b5cf3934a85277080) No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out_

_15:10:23 ERROR traceId=ad5511d672b3e0370c637d1fb918f448, parentId=, spanId=12c2d61cc5444467, sampled=true [io.qu.mu.ru.MutinyInfrastructure] (vert.x-eventloop-thread-0) Mutiny had to drop the following exception: com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., servers=[]
	at com.mongodb.internal.connection.BaseCluster.createTimeoutException(BaseCluster.java:428)
	at com.mongodb.internal.connection.BaseCluster.handleServerSelectionRequest(BaseCluster.java:324)
	at com.mongodb.internal.connection.BaseCluster.access$700(BaseCluster.java:69)
	at com.mongodb.internal.connection.BaseCluster$WaitQueueHandler.run(BaseCluster.java:503)
	at java.base/java.lang.Thread.run(Unknown Source)

As upgrading further in 3.6.1 will require extended effort in a series of services, would you please update us if there are implemented enhancements in mongo connection establishment area in Quarkus 3.6.1 that could resolve the above issue and which are they in high level?

Thank you all for your assistance

charalampos-micholias avatar Dec 12 '23 21:12 charalampos-micholias

Hello, we are experiencing similar issues with Quarkus 3.5.3, we have these stacktraces showing every once in a while when connected to a MongoDB Atlas cluster: Unable to look up SRV record for host _mongodb._tcp.option-pl-0.91p5n.mongodb.net we are using the reactive client. Sorry I don't have th full stacktrace, can't find a way to export it cleanly for datadog.

va1erian avatar Dec 13 '23 11:12 va1erian

hello all! @loicmathieu what kind of corrections are available on 3.6.1?

roiar avatar Jan 04 '24 14:01 roiar

Hello @loicmathieu, we would like to ask if there is an ongoing investigation on the above issue. Thank you,

charalampos-micholias avatar Jan 08 '24 14:01 charalampos-micholias

Facing the same problem here

danielspeixoto avatar Jun 11 '24 00:06 danielspeixoto

@va1erian @roiar @charalampos-micholias in JVM mode we use the default MongoDB DNS resolver, nothing specific to Quarkus. However, in native mode there is an experimental DNS resolver as the MongoDB DNS resolver uses JNI which is not supported in native, this is explained here: https://quarkus.io/guides/mongodb#using-mongosrv-urls

Did you deploy to native and use the experimental DNS resolver?

loicmathieu avatar Jun 13 '24 17:06 loicmathieu

@loicmathieu we deploy a regular JVM without touching the dns resolver configuration. However note that this issue disappeared at one point when we upgraded Quarkus, I can't recall the exact version but it has been a couple of month already. We are currently running Quarkus 3.10

va1erian avatar Jun 13 '24 18:06 va1erian

We're facing exactly the same issue running Quarkus 3.10. Don't know if this is relevant, but we're using JVM Images for Java 17, so, if I read the documentation correctly, it should be this base image: registry.access.redhat.com/ubi8/openjdk-17-runtime:1.19

Relevant Log Messages:

Exception while resolving SRV records
No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.some.name.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out

If we remove the srv Record from the config and add all the single database entries, it is working.

Tomatenjoghurt avatar Jun 26 '24 13:06 Tomatenjoghurt

See https://github.com/quarkusio/quarkus/discussions/31971#discussioncomment-10613900 for a possible workaround

geoand avatar Sep 11 '24 13:09 geoand

We believe that #43222 fixes this, so I will close it for now, but if the issue persists with 3.15 (which will contain the fix), we can reopen

geoand avatar Sep 13 '24 10:09 geoand

Hello @geoand , we are still facing the issue with Quarkus 3.14.4 (native), the only workaround we were able to apply with major effects, was ditching mongo+srv:// in favor of simple mongo:// connection string. (ref. we used: https://stackoverflow.com/questions/78713411/com-mongodb-mongoconfigurationexception-failed-looking-up-txt-record-for-host)

We inspected the solution in #43222, and concluded that to our case the retry is not effective, since that once the error is emitted by the Mongo connection attempt, nothing will work until the pod / application is restarted and some period (1 - 10min) passes.

We contacted Mongo Atlas support to little help, except an upsell that we bought, and the problem persisted.

Additional info: We have at least 4 Quarkus microservices running 3.8.6 (LTS) (Native) and 3.14 that are not affected by the issue.

Gabriel-Longo-at-DB1 avatar Sep 16 '24 19:09 Gabriel-Longo-at-DB1

We inspected the solution in https://github.com/quarkusio/quarkus/pull/43222, and concluded that to our case the retry is not effective, since that once the error is emitted by the Mongo connection attempt, nothing will work until the pod / application is restarted and some period (1 - 10min) passes.

Can you elaborate on this a little more please?

geoand avatar Sep 17 '24 05:09 geoand

We inspected the solution in #43222, and concluded that to our case the retry is not effective, since that once the error is emitted by the Mongo connection attempt, nothing will work until the pod / application is restarted and some period (1 - 10min) passes.

Can you elaborate on this a little more please?

Thanks for the reply, I'll try to elaborate a bit more:

Application will emit the following exception for a couple o seconds, then health-check will fail and app restarted, then we will randomly face the same problem for a few minutes, until connection is successful, then the cycle repeats, so in this case it seems the retry (part of the solution of #43222 ) doesn't work because the srv resolution will persistently fail for minutes at times.

Mutiny had to drop the following exception [Error Occurred After Shutdown]: com.mongodb.MongoTimeoutException: Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.<our-mongo-sub-domain>.mongodb.net'., servers=[]
	at com.mongodb.internal.connection.BaseCluster.createAndLogTimeoutException(BaseCluster.java:392)
	at com.mongodb.internal.connection.BaseCluster.selectServer(BaseCluster.java:148)
	at com.mongodb.internal.connection.AbstractMultiServerCluster.selectServer(AbstractMultiServerCluster.java:54)
	at com.mongodb.internal.binding.ClusterBinding.getReadConnectionSource(ClusterBinding.java:108)
	at com.mongodb.client.internal.ClientSessionBinding.getConnectionSource(ClientSessionBinding.java:128)
	at com.mongodb.client.internal.ClientSessionBinding.getReadConnectionSource(ClientSessionBinding.java:92)
	at com.mongodb.internal.operation.SyncOperationHelper.withSuppliedResource(SyncOperationHelper.java:141)
	at com.mongodb.internal.operation.SyncOperationHelper.withSourceAndConnection(SyncOperationHelper.java:122)
	at com.mongodb.internal.operation.SyncOperationHelper.lambda$executeRetryableRead$4(SyncOperationHelper.java:186)
	at com.mongodb.internal.operation.SyncOperationHelper.lambda$decorateReadWithRetries$12(SyncOperationHelper.java:289)
	at com.mongodb.internal.async.function.RetryingSyncSupplier.get(RetryingSyncSupplier.java:67)
	at com.mongodb.internal.operation.SyncOperationHelper.executeRetryableRead(SyncOperationHelper.java:191)
	at com.mongodb.internal.operation.SyncOperationHelper.executeRetryableRead(SyncOperationHelper.java:173)
	at com.mongodb.internal.operation.CommandReadOperation.execute(CommandReadOperation.java:48)
	at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:153)
	at com.mongodb.client.internal.MongoDatabaseImpl.executeCommand(MongoDatabaseImpl.java:196)
	at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:165)
	at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:160)
	at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:150)
	at io.quarkus.mongodb.health.MongoHealthCheck$MongoClientCheck$1.get(MongoHealthCheck.java:185)
	at io.quarkus.mongodb.health.MongoHealthCheck$MongoClientCheck$1.get(MongoHealthCheck.java:182)
	at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)
	at io.smallrye.mutiny.operators.uni.builders.UniCreateFromItemSupplier.subscribe(UniCreateFromItemSupplier.java:28)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:635)
	at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1495)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at [email protected]/java.lang.Thread.run(Thread.java:840)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:885)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:860)

this shorter stack trace is also logged:

Mutiny had to drop the following exception: com.mongodb.MongoTimeoutException: Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.<our-mongo-sub-domain>.mongodb.net'., servers=[]
	at com.mongodb.internal.connection.BaseCluster.createAndLogTimeoutException(BaseCluster.java:392)
	at com.mongodb.internal.connection.BaseCluster.handleServerSelectionRequest(BaseCluster.java:293)
	at com.mongodb.internal.connection.BaseCluster.access$800(BaseCluster.java:85)
	at com.mongodb.internal.connection.BaseCluster$WaitQueueHandler.run(BaseCluster.java:475)
	at [email protected]/java.lang.Thread.run(Thread.java:840)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:885)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:860)

We are also trying find if there are any problems with DNS resolution in our infrastructure, but so far, nothing.

Edit: seems to be related to https://github.com/quarkusio/quarkus/issues/32313

Gabriel-Longo-at-DB1 avatar Sep 17 '24 12:09 Gabriel-Longo-at-DB1

I wonder if quarkus.mongodb.dns.lookup-timeout will help in your case.

geoand avatar Sep 17 '24 12:09 geoand

I wonder if quarkus.mongodb.dns.lookup-timeout will help in your case.

Great, we will try-it out and report the results, thank you very much!

Gabriel-Longo-at-DB1 avatar Sep 17 '24 12:09 Gabriel-Longo-at-DB1

🙏🏽

geoand avatar Sep 17 '24 12:09 geoand

We are facing the same issue

Exception while resolving SRV records: com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb...'

Caused by: com.mongodb.MongoConfigurationException: Unable to look up SRV record for host _mongodb._tcp.domain.net

this is with Quarkus version 3.17.2, and Java 17. I have tried the lookup timeout option with no luck.

Any help appreciated.

ssonnad avatar Dec 04 '24 20:12 ssonnad

We are also facing the same issue with srv

at io.quarkus.mongodb.runtime.dns.MongoDnsClient.resolveTxtRequest(MongoDnsClient.java:175) at io.quarkus.mongodb.runtime.dns.MongoDnsClient.getResourceRecordData(MongoDnsClient.java:106) at com.mongodb.internal.dns.DefaultDnsResolver.resolveAdditionalQueryParametersFromTxtRecords(DefaultDnsResolver.java:118)

gsantoshgit avatar Feb 27 '25 17:02 gsantoshgit

Can you show the entire stacktrace please?

geoand avatar Feb 28 '25 06:02 geoand