envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Envoy can't run when providing service/domain name in cluster endpoint address instead of ip

Open shiponcs opened this issue 1 year ago • 5 comments

Title: Envoy can't run when providing service/domain name in cluster endpoint address instead of ip

Description: I have changed the API of MySQL proxy so that I can configure it in this way:

  - name: envoy.filters.network.mysql_proxy
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.network.mysql_proxy.v3.MySQLProxy
      stat_prefix: egress_mysql
      audit_log:
        name: envoy.audit_loggers.opentelemetry
        typed_config:
          "@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
          grpc_service:
            envoy_grpc:
              cluster_name: opentelemetry_collector
            timeout: 0.250s
          service_name: gateway_mysql

And, I need to add a cluster in the cluster list of the configuration with name opentelemetry_collector like this:

  - name: opentelemetry_collector
    type: STRICT_DNS
    connect_timeout: 500s
    load_assignment:
      cluster_name: opentelemetry_collector
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 4317

From MySQL proxy source code, I am fetching the address of this cluster inside this function, createFilterFactoryFromProtoTyped like this:

auto address_str = context.getTransportSocketFactoryContext().clusterManager().getThreadLocalCluster(otel_collector_cluster_name)->loadBalancer().chooseHost(nullptr)->address()->asString();

Here, otel_collector_cluster_name is always correctly fetched from proto_config. And, the value of address_str is found as 127.0.0.1:4317 as expected for the cluster given above. But the problem occurs when I provide hostname/FQDN of an endpoint in address field of the socket_address of the opentelemetry_collector cluster. It can't resolve the name.

Envoy Logs:

[2024-05-27 12:15:53.811][27481][info][config] [source/server/configuration_impl.cc:164] loading tracing configuration
[2024-05-27 12:15:53.811][27481][info][config] [source/server/configuration_impl.cc:124] loading 0 static secret(s)
[2024-05-27 12:15:53.811][27481][info][config] [source/server/configuration_impl.cc:130] loading 2 cluster(s)
[2024-05-27 12:15:53.812][27481][info][config] [source/server/configuration_impl.cc:134] loading 1 listener(s)
[2024-05-27 12:15:53.812][27481][critical][backtrace] [./source/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x0
[2024-05-27 12:15:53.812][27481][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2024-05-27 12:15:53.812][27481][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: 6c426bcbc2b57540d1c1c864ca25534d09339179/1.30.0-dev/Modified/RELEASE/BoringSSL
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #0: [0x74f5e6442520]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #1: [0x57fb0d4a138e]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #2: [0x57fb0d0dc39a]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #3: [0x57fb0d0ebe2b]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #4: [0x57fb0d0eafc6]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #5: [0x57fb0d0eaad9]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #6: [0x57fb0d0f9f7c]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #7: [0x57fb0d0cbbaa]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #8: [0x57fb0d0e5b46]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #9: [0x57fb0d0e45f9]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #10: [0x57fb0d14800b]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #11: [0x57fb0cfa86a2]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #12: [0x57fb0cfa3adc]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #13: [0x57fb0cf54858]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #14: [0x57fb0cf55a0e]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #15: [0x57fb0cf53814]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #16: [0x57fb0cf53f7e]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #17: [0x57fb0cf5410c]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #18: [0x57fb0b76f14c]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #19: [0x74f5e6429d90]
fish: Job 1, './envoy-contrib -c /home/shipon…' terminated by signal SIGSEGV (Address boundary error)

shiponcs avatar May 27 '24 06:05 shiponcs

Instead of trying to get the address of the upstream, you should just ask the cluster to give you a connection. DNS resolution will happen for you automatically. Probably ThreadLocalCluster::tcpConn() will do what you need.

ggreenway avatar May 28 '24 16:05 ggreenway

Tried this but got the same error:

context.getTransportSocketFactoryContext().clusterManager().getThreadLocalCluster(otel_collector_cluster_name)->tcpConn(nullptr).host_description_->address()->asString();

shiponcs avatar May 29 '24 10:05 shiponcs

You misunderstood what I'm suggesting. You don't want to get a connection so that you can get the address from it (presumably to create a connection); you want to get the connection and use that for communication with mysql (or whatever you're connecting to).

ggreenway avatar May 29 '24 15:05 ggreenway

Sorry. I need the address instead because I need this address for otel collector configuration. I am using otel official SDK, which I don't think, can be configured with the connection returned by ThreadLocalCluster::tcpConn() instead of using the address.

shiponcs avatar May 30 '24 05:05 shiponcs

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 29 '24 08:06 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Jul 06 '24 08:07 github-actions[bot]