OpenTelemetry input plugin not working with Envoy Proxy - throws HTTP 400 Error
Bug Report
Describe the bug I'm trying to use the opentelemetry input plugin (HTTP/Protobuf) to capture traces from Envoy Proxy. But fluentbit throws a 400 Bad Request error whenever Envoy tries to send a trace over.
To Reproduce
- Here is the envoy configuration I'm using:
static_resources:
listeners:
- address:
socket_address:
address: "0.0.0.0"
port_value: 30443
filter_chains:
- filters:
- name: "envoy.filters.network.http_connection_manager"
typed_config:
"@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager"
codec_type: "auto"
stat_prefix: "ingress_https"
tracing:
provider:
name: "envoy.tracers.opentelemetry"
typed_config:
"@type": "type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig"
http_service:
http_uri:
uri: "http://127.0.0.1:5317/v1/traces"
cluster: "opentelemetry_collector"
timeout: "0.250s"
route_config:
virtual_hosts:
- name: "backend"
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: "Test"
auto_host_rewrite: true
timeout: "0s"
http_filters:
- name: "envoy.filters.http.router"
typed_config:
"@type": "type.googleapis.com/envoy.extensions.filters.http.router.v3.Router"
transport_socket:
name: "envoy.transport_sockets.tls"
typed_config:
"@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext"
common_tls_context:
tls_certificates:
- certificate_chain:
filename: "localhost.crt"
private_key:
filename: "localhost.key"
tls_params:
tls_maximum_protocol_version: "TLSv1_3"
tls_minimum_protocol_version: "TLSv1_2"
clusters:
- name: "Test"
connect_timeout: "30.0s"
type: "STRICT_DNS"
dns_lookup_family: "V4_ONLY"
lb_policy: "ROUND_ROBIN"
load_assignment:
cluster_name: "Test"
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: "google.com"
port_value: 443
transport_socket:
name: "envoy.transport_sockets.tls"
typed_config:
"@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext"
- name: "opentelemetry_collector"
type: "STRICT_DNS"
lb_policy: "ROUND_ROBIN"
load_assignment:
cluster_name: "opentelemetry_collector"
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: "127.0.0.1"
port_value: 5317
admin:
address:
socket_address:
address: "0.0.0.0"
port_value: 8001
layered_runtime:
layers:
- name: "admin_layer_0"
admin_layer: {}
- This configuration requires envoy version 1.28 or above.
Your Environment
- Version used: Fluent Bit 2.1.10
- Environment name and version (e.g. Kubernetes? What version?): Virtual Machine
- Server type and version: Envoy Proxy 1.28
- Operating System and version: Ubuntu 20.04LTS
- Filters and plugins: in_opentelemetry
@dceravigupta would you please provide more steps and fluent bit config to reproduce the problem ?
usually what takes time for us is to prepare the repro case, if you can provide it would be great
@edsiper here is my fluent bit config:
[INPUT] name opentelemetry listen 127.0.0.1 port 5317 tag opentelemetry.3
[OUTPUT] name stdout match *
Let me try to repro it again with the latest fluent bit and will share the repro steps with all the configs soon. In the meantime, you can also checkout similar issue reported by other folks. https://github.com/fluent/fluent-bit/issues/7678
@edsiper please download fluentbitenvoy.zip attachment under a directory on a Linux machine (I tried on Ubuntu)
Here are the repo steps:
- Open the bash and navigate to directory containing all the files.
- Install and launch fluent bit with the config in the attachment.
sudo apt-get install fluent-bit
/opt/fluent-bit/bin/fluent-bit -c fluentbit.conf
- In second bash terminal, launch Envoy with the provided config
./envoy_1_28 -l debug -c envoyconfig.yml
- Even without hitting Envoy endpoint, you can see following errors in its logs:
- If you don't see any errors, in a third terminal, hit the following endpoint using curl.
curl -k http://localhost:30443/todos/1
@edsiper did you get a chance to look at the repro? Let me know if you need any more data points from my side. Thanks!
might be relevant: https://github.com/fluent/fluent-bit/issues/8742
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
This issue was closed because it has been stalled for 5 days with no activity.