Envoy can't run when providing service/domain name in cluster endpoint address instead of ip
Title: Envoy can't run when providing service/domain name in cluster endpoint address instead of ip
Description: I have changed the API of MySQL proxy so that I can configure it in this way:
- name: envoy.filters.network.mysql_proxy
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.mysql_proxy.v3.MySQLProxy
stat_prefix: egress_mysql
audit_log:
name: envoy.audit_loggers.opentelemetry
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
grpc_service:
envoy_grpc:
cluster_name: opentelemetry_collector
timeout: 0.250s
service_name: gateway_mysql
And, I need to add a cluster in the cluster list of the configuration with name opentelemetry_collector like this:
- name: opentelemetry_collector
type: STRICT_DNS
connect_timeout: 500s
load_assignment:
cluster_name: opentelemetry_collector
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 4317
From MySQL proxy source code, I am fetching the address of this cluster inside this function, createFilterFactoryFromProtoTyped like this:
auto address_str = context.getTransportSocketFactoryContext().clusterManager().getThreadLocalCluster(otel_collector_cluster_name)->loadBalancer().chooseHost(nullptr)->address()->asString();
Here, otel_collector_cluster_name is always correctly fetched from proto_config. And, the value of address_str is found as 127.0.0.1:4317 as expected for the cluster given above.
But the problem occurs when I provide hostname/FQDN of an endpoint in address field of the socket_address of the opentelemetry_collector cluster. It can't resolve the name.
Envoy Logs:
[2024-05-27 12:15:53.811][27481][info][config] [source/server/configuration_impl.cc:164] loading tracing configuration
[2024-05-27 12:15:53.811][27481][info][config] [source/server/configuration_impl.cc:124] loading 0 static secret(s)
[2024-05-27 12:15:53.811][27481][info][config] [source/server/configuration_impl.cc:130] loading 2 cluster(s)
[2024-05-27 12:15:53.812][27481][info][config] [source/server/configuration_impl.cc:134] loading 1 listener(s)
[2024-05-27 12:15:53.812][27481][critical][backtrace] [./source/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x0
[2024-05-27 12:15:53.812][27481][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2024-05-27 12:15:53.812][27481][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: 6c426bcbc2b57540d1c1c864ca25534d09339179/1.30.0-dev/Modified/RELEASE/BoringSSL
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #0: [0x74f5e6442520]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #1: [0x57fb0d4a138e]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #2: [0x57fb0d0dc39a]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #3: [0x57fb0d0ebe2b]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #4: [0x57fb0d0eafc6]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #5: [0x57fb0d0eaad9]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #6: [0x57fb0d0f9f7c]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #7: [0x57fb0d0cbbaa]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #8: [0x57fb0d0e5b46]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #9: [0x57fb0d0e45f9]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #10: [0x57fb0d14800b]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #11: [0x57fb0cfa86a2]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #12: [0x57fb0cfa3adc]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #13: [0x57fb0cf54858]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #14: [0x57fb0cf55a0e]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #15: [0x57fb0cf53814]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #16: [0x57fb0cf53f7e]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #17: [0x57fb0cf5410c]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #18: [0x57fb0b76f14c]
[2024-05-27 12:15:53.813][27481][critical][backtrace] [./source/server/backtrace.h:98] #19: [0x74f5e6429d90]
fish: Job 1, './envoy-contrib -c /home/shipon…' terminated by signal SIGSEGV (Address boundary error)
Instead of trying to get the address of the upstream, you should just ask the cluster to give you a connection. DNS resolution will happen for you automatically. Probably ThreadLocalCluster::tcpConn() will do what you need.
Tried this but got the same error:
context.getTransportSocketFactoryContext().clusterManager().getThreadLocalCluster(otel_collector_cluster_name)->tcpConn(nullptr).host_description_->address()->asString();
You misunderstood what I'm suggesting. You don't want to get a connection so that you can get the address from it (presumably to create a connection); you want to get the connection and use that for communication with mysql (or whatever you're connecting to).
Sorry. I need the address instead because I need this address for otel collector configuration. I am using otel official SDK, which I don't think, can be configured with the connection returned by ThreadLocalCluster::tcpConn() instead of using the address.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.