ApplicationInsights-Java icon indicating copy to clipboard operation
ApplicationInsights-Java copied to clipboard

ApplicationInsights-Java agent should provide option to disable host name verification

Open givanovmras opened this issue 1 year ago • 8 comments

Expected behavior

An option to disable host name verification during SSL Handshake with either the ingestion or live metrics endpoints.

Actual behavior

When using AI connection string to connect to either the ingestion or live metrics end-points in a PEP setup the certificates provided by Microsft/Azure do not match the specific endpoint domain name, as a result of which the agent fails to connect, issuing "No subject alternative DNS name matching <pep_ingestion_or_live_metrics_endpoint_domain_name> found" SSL error.

To Reproduce

This can only be reproduced by suing ingestion/live metrics endpoints that have domain name mismatch. Just configure the AI connection string to point to those as normal, then start the agent and observe the following error:

javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching <pep_ingestion_or_live_metrics_endpoint_domain_name> found.

System information

SDK Version: OpenJDK 11 OS type and version: Docker image based on OpenJDK 11 with Apache Tomcat 9.0.43 installed and configured Using spring-boot: No Additional relevant libraries (with version, if applicable): application insights java agent v3.1.1 used (applicationinsights-agent-3.1.1.jar)

Logs

2024-10-11T14:06:43.435614241Z 2024-10-11 14:06:43.434Z WARN c.a.m.o.e.i.p.TelemetryPipeline - Sending telemetry to the ingestion service: No subject alternative DNS name matching <pep_ingestion_or_live_metrics_endpoint_domain_name> found. (https://<pep_ingestion_or_live_metrics_endpoint_domain_name>/v2.1/track) (telemetry will be stored to disk and retried) (future warnings will be aggregated and logged once every 5 minutes) 2024-10-11T14:06:43.435658442Z javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching <pep_ingestion_or_live_metrics_endpoint_domain_name> found. 2024-10-11T14:06:43.435664442Z at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) 2024-10-11T14:06:43.435678842Z at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:360) 2024-10-11T14:06:43.435683742Z at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:303) 2024-10-11T14:06:43.435688342Z at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:298) 2024-10-11T14:06:43.435692242Z at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1357) 2024-10-11T14:06:43.435696042Z at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1232) 2024-10-11T14:06:43.435703342Z at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1175) 2024-10-11T14:06:43.435707543Z at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) 2024-10-11T14:06:43.435712943Z at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) 2024-10-11T14:06:43.435718843Z at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1076) 2024-10-11T14:06:43.435722843Z at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1063) 2024-10-11T14:06:43.435727243Z at java.base/java.security.AccessController.doPrivileged(Native Method) 2024-10-11T14:06:43.435731143Z at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1010) 2024-10-11T14:06:43.435735243Z at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1651) 2024-10-11T14:06:43.435739543Z at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1497) 2024-10-11T14:06:43.435743443Z at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338) 2024-10-11T14:06:43.435747143Z at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387) 2024-10-11T14:06:43.435764443Z at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) 2024-10-11T14:06:43.435769243Z at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) 2024-10-11T14:06:43.435773344Z at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) 2024-10-11T14:06:43.435777844Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) 2024-10-11T14:06:43.435781844Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) 2024-10-11T14:06:43.435787544Z at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) 2024-10-11T14:06:43.435791644Z at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) 2024-10-11T14:06:43.435795644Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) 2024-10-11T14:06:43.435799544Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) 2024-10-11T14:06:43.435806044Z at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) 2024-10-11T14:06:43.435809544Z at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) 2024-10-11T14:06:43.435812244Z at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:425) 2024-10-11T14:06:43.435815044Z at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) 2024-10-11T14:06:43.435817644Z at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) 2024-10-11T14:06:43.435820344Z at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) 2024-10-11T14:06:43.435822844Z at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413) 2024-10-11T14:06:43.435825344Z at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) 2024-10-11T14:06:43.435828044Z at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 2024-10-11T14:06:43.435830744Z at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) 2024-10-11T14:06:43.435833344Z at java.base/java.lang.Thread.run(Thread.java:829) 2024-10-11T14:06:43.435836145Z Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching <pep_ingestion_or_live_metrics_endpoint_domain_name> found. 2024-10-11T14:06:43.435840045Z at java.base/sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:212) 2024-10-11T14:06:43.435842645Z at java.base/sun.security.util.HostnameChecker.match(HostnameChecker.java:103) 2024-10-11T14:06:43.435845145Z at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:461) 2024-10-11T14:06:43.435851645Z at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:421) 2024-10-11T14:06:43.435854345Z at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:283) 2024-10-11T14:06:43.435857145Z at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:141) 2024-10-11T14:06:43.435859845Z at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1335) 2024-10-11T14:06:43.435862645Z ... 32 common frames omitted

Screenshots

N/A

givanovmras avatar Oct 14 '24 11:10 givanovmras

live metric and ingestion endpoints are directly retrieved from your connection string. you can't modify it. I don't understand your ask here.

additionally, you're using an older version of java agent 3.1.1. Can you try 3.6.1?

source code: https://github.com/Azure/azure-sdk-for-java/blob/d10d62bf66f26b2c0dfc8bea105507c2de309740/sdk/monitor/azure-monitor-opentelemetry-exporter/src/main/java/com/azure/monitor/opentelemetry/exporter/implementation/configuration/ConnectionStringBuilder.java#L79

heyams avatar Oct 16 '24 19:10 heyams

Hi Helen. The ask is to have the ability to disable host name verification as part of the SSL handshake to allow it to connect to the ingestion/live metrics endpoints. These endpoints do not have the correct certificate (CN is set to a different name, compared to the one the endpoint uses), so if host name verification is enabled, the connection fails.

I don't think upgrading to 3.6.1 will solve this issue, unless an option was added for ignoring the host name verification.

Thanks.

givanovmras avatar Oct 17 '24 07:10 givanovmras

These endpoints do not have the correct certificate (CN is set to a different name, compared to the one the endpoint uses)

@givanovmras can you share the endpoint(s) where you are seeing this problem so we can investigate?

trask avatar Oct 17 '24 14:10 trask

@trask I can't share the endpoints DN as these are privately created from within the portal. I am not sure whether you'd like me to share the CN of both certificates either, as they may be internal to Azure/Microsoft.

Regardless, as already stated above, we'd like to have the ability to disable the host name verification for the agent to avoid this problem happening in the first place. Asking Azure/Microsoft to change their own certificates is going to be subject to a separate support request and it will take time to resolve (if ever).

Thanks.

givanovmras avatar Oct 18 '24 07:10 givanovmras

Not sure if it related but we have same error with SSL handshake recently, which causing service cannot send telemetry to AI

logs: In the last 5 minutes, the following operation has failed 10 times (out of 10): Sending telemetry to the ingestion service (retry from disk): Failed to create SSL connection (https://southeastasia-0.in.applicationinsights.azure.com/v2.1/track) (will be retried again) (10 times)

We do not use agent but using quarkus + otel sdk instead, and the error suddenly appears few days ago

hungchu0912 avatar Oct 18 '24 07:10 hungchu0912

I concur with @hungchu0912 - our endpoint host name is very similar. The certificate presented to the agent has a CN that is different (understand more generic in nature), hence the verification fails and the agent cannot connect to the endpoint.

givanovmras avatar Oct 18 '24 08:10 givanovmras

in my case, the SANs look correct to me, also if we are lowering the app version that built in last month (no changes related to open telemetry), issues are gone :/

hungchu0912 avatar Oct 18 '24 08:10 hungchu0912

i'll try to rebuilt another service which running with agent to see if the issue still persist. Edit: agent works fine for me

hungchu0912 avatar Oct 18 '24 08:10 hungchu0912

@givanovmras > I can't share the endpoints DN as these are privately created from within the portal

Can you list out step by step instructions how you create a private DN in the portal and where exactly?

Application Insights doesn't allow tampering connection string. This is not a feature we support. It requires private certificate. We do not support it.

if you set up your own private domain name on your own server, and then configure your internal server to talk to our endpoint, that is allowed.. However, in this case, you need to fix the DN yourself because it's between your app and your internal server.  

heyams avatar Oct 23 '24 17:10 heyams

@heyams just to clarify: The private endpoint has been created as part of setting up Azure Monitor Private Link Scope for application insights. As part of this process (done via the portal) a DNS zone is automatically created (by Azure) that contains the various host names for ingestion and live metrics endpoints that point to the log analytics workspace/AI.

Again, this is done automatically and host names for these 2 endpoints are also allocated/determined automatically by Azure at creation time, dependent on many factors, including the zone in which these resources are created.

That's all well and good, but as part of that (automated) setup, the certificates for the two endpoints generated by Azure do not match the domain names allocated to the private endpoint (the CN in these certificates is a "generic" one for both endpoints).

As a result of that, when the Java AI agent tries to use either endpoint, it fails the SSL handshake since the domain names (in the private endpoint DNS zone that was created by Azure) do not match the CN in the certificate.

We do not have control over the certificates and cannot change them - they are generated by Azure when the private endpoint for the PLS has been created and its DNS zone set up.

This is an issue that is going to be logged separately with Azure/Microsoft to fix, but we need the ability to enable/disable host name verification in the java AI agent, so that it can connect to both endpoints, even though the CN in the certificate presented by these do not match the actual host name. This is what we are asking for here.

Hope the above makes sense! Thanks.

givanovmras avatar Oct 30 '24 08:10 givanovmras

Can you please try it again without modifying the connection string (straight copied from Application Insights portal)? It doesn't need to match host name in the DNS entries that autogenerated by AMPLS. Those DNS entries are already in the CNAME resolution chain. It should handle it for you automatically. Modifying the connection string will break SSL cert validation. DNS resolution chain will update the private endpoint for you.

heyams avatar Oct 30 '24 19:10 heyams

@heyams we haven't modified the connection string at all - it points to the ingestion and live metrics endpoints that we set up as part of the Private Scope Link setup.

The agent can establish a connection to that with no problem, but the SSL handshake fails, because the certificate presented by the relevant endpoint (auto-generated by Microsoft) doesn't match the name of it.

We can't use the publicly-available endpoint (as it is listed in the AI log analytics workspace), because we have a private vnet with disabled public access all the way through, so that's not going to work.

Again, all we ask is the ability to disable the host name verification on the agent, that's all. Can you do that please? Thanks.

givanovmras avatar Oct 31 '24 07:10 givanovmras

@givanovmras I have talked to our backend team, please go ahead to create an IcM to find out the root cause of your issue: cert mismatched. We need to know the full details of your setup, why you ran into this issue at the first place. They suspected that your setup is incorrect. Someone from the ingestion team will need to investigate it.

After that is done, we can circle back if there is a need to do anything from the SDK side.

heyams avatar Oct 31 '24 20:10 heyams

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.