opentelemetry-collector
opentelemetry-collector copied to clipboard
Multi-tenancy E2E test
Is your feature request related to a problem? Please describe.
End-to-end test that verifies multi-tenancy support.
Describe the solution you'd like
I am not familiar with the topic, but it came up in the discussion with @AlexanderWert and @jpkrohling during KubeCon EU that it will be great to have E2E test around multi-tenancy support, since we are not sure what's missing in the collector now.
@jpkrohling you mentioned that you have some initial ideas about the requirements of the tests, and will be happy to share them. Please feel free to edit this issue, or open another issue to capture the requirements.
I know that @dgoscn was working on some manual verifications to make sure that an end-to-end solution was possible, involving the header-setter extension and eventually getting to Loki exporter (or OTLP HTTP). The idea is that a tenant (X-Scope-OrgID) sent to the original receiver could be propagated down to backends via our exporters.
Let me know if there are updates, or if you need further clarification on what would be needed to perform this (manual) test.
Hello, @carsonip and @jpkrohling.
As Juraci mentioned above, I was working on some manual verifications some time ago with the header-setter extension. I will share some of my verifications below.
PS: I divided this answer into two blocks
- Setup
- Workflow to validate the steps
Setup
opentelemetry-collector-contrib - 0.90.1
config.yaml
receivers:
otlp:
protocols:
grpc: #4317
endpoint: localhost:4327
include_metadata: true
http: #4318
endpoint: 0.0.0.0:4328
include_metadata: true
processors:
extensions:
headers_setter:
headers:
- action: upsert
key: X-Scope-OrgID
from_context: x-scope-orgid
exporters:
# where the data goes
logging/info:
verbosity: basic
logging/debug:
verbosity: detailed
prometheusremotewrite:
endpoint: "http://localhost:9009/api/v1/push"
auth:
authenticator: headers_setter
tls:
insecure: true
otlp:
endpoint: http://localhost:4317
auth:
authenticator: headers_setter
tls:
insecure: true
connectors:
spanmetrics:
dimensions:
- name: http.method
- name: http.status_code
metrics_flush_interval: 15s
service:
telemetry:
metrics:
level: "detailed"
traces:
propagators: "tracecontext"
extensions: [headers_setter]
pipelines:
traces:
receivers: [otlp]
# processors: []
exporters: [spanmetrics, otlp]
metrics:
receivers: [spanmetrics]
processors: []
# exporters: [prometheusremotewrite, logging/debug, prometheus ]
exporters: [prometheusremotewrite]
Running the collector
cd ./cmd/otelcontribcol && GO111MODULE=on go run --race . --config ../../local/config.yaml && --log-level=DEBUG
Grafana Mimir
I made use of Play with Grafana for testing
On the Mimir directory:
cd mimir && cd docs/sources/mimir/get-started/play-with-grafana-mimir
docker compose up
Grafana Mimir Dashboard - http://localhost:9000/ Mimir UI - http://localhost:9009/
Grafana Tempo
I just downloaded the grafana tempo tempo_2.2.3_linux_amd64.tar.gz from the GitHub repository
And then executed the following commands:
tar vxf tempo_2.2.3_linux_amd64.tar.gz
cd tempo
./tempo --config.file tempo.yaml --multitenancy.enabled=true
Grafana Dashboard
I had some problems testing some data sources with the Grafana Dashboard from Play with Mimir, so I decided to create another one just for the sake of happiness (sorry for the overhead).
https://grafana.com/grafana/download/9.4.3 Downloaded on this link
And running through the command:
cd grafana-9.4.3 && ./bin/grafana server
Grafana UI - http://localhost:3000/
Workflow to validate the steps
Collector:
extensions:
headers_setter:
headers:
- action: upsert
key: X-Scope-OrgID
from_context: x-scope-orgid
service:
extensions:
- headers_setter
pipelines:
metrics:
receivers:
- spanmetrics # (3) connector
processors:
exporters:
- prometheusremotewrite # (4) mimir
traces:
receivers:
- otlp # (1) telemetrygen
processors:
exporters:
- spanmetrics # (2) connector
- otlp # (2) Tempo
- The objective is for the HTTP request in step 4 to have the X-Scope-OrgID received from the client.
- Create an instance of Mimir to receive data via Remote Write
- Create an instance of Grafana to visualize the data
- Create a Time instance for traces
- Send data to Mimir with X-Scope-OrgID
- Send data to Tempo with X-Scope-OrgID
- Create a Mimir data source in Grafana using X-Scope-OrgID
- Create a Time data source in Grafana using X-Scope-OrgID
- View tenant metrics and traces
tracegen for generate the telemetry of traces
tracegen -otlp-endpoint localhost:4317 -otlp-insecure -service cajuina -otlp-header 'X-Scope-OrgID="demo"'
telemetrygen for generate the telemetry of metrics
telemetrygen metrics --otlp-header='X-Scope-OrgID="123"' --otlp-insecure
Checking the outputs
On the Grafana Explore, you can see the results of the tracegen
On the other hand, you can check the Header X-Scope-OrgID="demo" set on the Data Source for the Tempo
It is also set for the Grafana Mimir
However, due the time I don't remember very well what the steps to validate the Mimir. For instance, I had some trouble to test "again" the telemetrygen for metrics
i.e.
telemetrygen metrics --workers 1 --interval 1s --metrics 1 --rate 1 --otlp-endpoint localhost:4317 --otlp-insecure --otlp-header='X-Scope-OrgID="demo"'
2024-05-01T12:39:33.690-0300 INFO [email protected]/clientconn.go:1338 [core][Channel #1 SubChannel #2] Subchannel Connectivity change to READY {"system": "grpc", "grpc_log": true}
2024-05-01T12:39:33.690-0300 INFO [email protected]/clientconn.go:592 [core][Channel #1] Channel Connectivity change to READY {"system": "grpc", "grpc_log": true}
2024-05-01T12:39:33.690-0300 INFO metrics/metrics.go:94 generation of metrics is limited {"per-second": 1}
2024-05-01T12:39:33.691-0300 FATAL metrics/worker.go:55 exporter failed {"worker": 0, "error": "failed to upload metrics: rpc error: code = Unimplemented desc = unknown service opentelemetry.proto.collector.metrics.v1.MetricsService"}
Sorry for any misunderstanding, but I omitted some steps to avoid some wrong analysis. I hope that this can help and that we resolve the Mimir/Metrics step.
@carsonip , the idea is basically to:
- generate telemetry via telemetrygen and send a X-Scope-OrgID with the outgoing request
- set include_metadata on the receiver side, so that the HTTP headers are placed in the context
- export the received telemetry to the backend, like Tempo, and use the header-setter authenticator to propagate the X-Scope-OrgID header
- configure Tempo's datasource in Grafana to show only data related to the tenant set in the step 1
Same steps with metrics, and same steps with the span metrics generator (connector). A further step would be to use the batch processor, grouping by tenant.