tempo icon indicating copy to clipboard operation
tempo copied to clipboard

Multitenancy does not work with non-GRPC ingestion

Open mnadeem opened this issue 4 years ago • 18 comments

Describe the bug failed to extract org id when auth_enabled: true

To Reproduce Steps to reproduce the behavior:

tempo-local.yaml

auth_enabled: true

server:
  http_listen_port: 3100

distributor:
  receivers:                           # this configuration will listen on all ports and protocols that tempo is capable of.
    jaeger:                            # the receives all come from the OpenTelemetry collector.  more configuration information can
      protocols:                       # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/master/receiver
        thrift_http:                   #
        grpc:                          # for a production deployment you should only enable the receivers you need!
        thrift_binary:
        thrift_compact:
    zipkin:
    otlp:
      protocols:
        http:
        grpc:
    opencensus:

ingester:
  trace_idle_period: 10s               # the length of time after a trace has not received spans to consider it complete and flush it
  #max_block_bytes: 1_000_000           # cut the head block when it hits this size or ...
  traces_per_block: 1_000_000
  max_block_duration: 5m               #   this much time passes

compactor:
  compaction:
    compaction_window: 1h              # blocks in this time window will be compacted together
    max_compaction_objects: 1000000    # maximum size of compacted blocks
    block_retention: 1h
    compacted_block_retention: 10m

storage:
  trace:
    backend: local                     # backend configuration to use
    wal:
      path: E:\practices\docker\tempo\tempo\wal             # where to store the the wal locally
      bloom_filter_false_positive: .05 # bloom filter false positive rate.  lower values create larger filters but fewer false positives
      index_downsample: 10             # number of traces per index record
    local:
      path: E:\practices\docker\tempo\blocks
    pool:
      max_workers: 100                 # the worker pool mainly drives querying, but is also used for polling the blocklist
      queue_depth: 10000
docker run -d --rm -p 6831:6831/udp -p 9411:9411 -p 3100:3100  --name tempo -v E:\practices\docker\tempo\tempo-local.yaml:/etc/tempo-local.yaml --network docker-tempo  grafana/tempo:0.5.0 --config.file=/etc/tempo-local.yaml
docker run -d --rm -p 16686:16686 -v E:\practices\docker\tempo\tempo-query.yaml:/etc/tempo-query.yaml  --network docker-tempo  grafana/tempo-query:0.5.0  --grpc-storage-plugin.configuration-file=/etc/tempo-query.yaml
curl -X POST http://localhost:9411 -H 'Content-Type: application/json' -H 'X-Scope-OrgID: demo' -d '[{
 "id": "1234",
 "traceId": "0123456789abcdef",
 "timestamp": 1608239395286533,
 "duration": 100000,
 "name": "span from bash!",
 "tags": {
    "http.method": "GET",
    "http.path": "/api"
  },
  "localEndpoint": {
    "serviceName": "shell script"
  }
}]'
level=error ts=2021-01-31T07:16:33.6168738Z caller=log.go:27 msg="failed to extract org id" err="no org id"

image

Expected behavior

Environment:

  • Infrastructure: [ Kubernetes, laptop]
  • Deployment tool: [manual and jenkins]

Additional Context

mnadeem avatar Jan 31 '21 07:01 mnadeem

Same issue with grafana/tempo:latest

mnadeem avatar Jan 31 '21 07:01 mnadeem

"auth_enabled" is a bit misleading, but it is consistent with Cortex and Loki. What this does is simply checks all incoming requests for a tenant id in the X-Scope-OrgID header. Unless you are doing a multitenant Tempo install just set auth_enabled to false.

joe-elliott avatar Jan 31 '21 17:01 joe-elliott

I have enabled multitent in tempo.yaml and started tempo backend

auth_enabled: true

Curl and post man fails even though I am passing the X-Scope-OrgID header

curl -X POST http://localhost:9411 -H 'Content-Type: application/json' -H 'X-Scope-OrgID: demo' 

And the message is

level=error ts=2021-01-31T07:16:33.6168738Z caller=log.go:27 msg="failed to extract org id" err="no org id"

Note: I do want to enable multi tenant

Same behavior in openshift

mnadeem avatar Jan 31 '21 17:01 mnadeem

This smells like a bug on the Zipkin receiver, where we are not passing our authentication middleware to the zipkin server, but I will have to investigate further to confirm.

annanay25 avatar Feb 01 '21 10:02 annanay25

Also for our documentation, multitenancy_enabled is probably a better name for that config label, rather than auth_enabled.

annanay25 avatar Feb 01 '21 10:02 annanay25

Update after some preliminary digging -

  • Middleware authentication is not passed down to any receiver from the vendored collector
  • As of now, we are able to extract tenant information from grpc metadata only, because it translates header info into ctx
  • This commit shows what we need at every receiver to make multitenancy work (make sure to view the collapsed vendor changes)
  • It may be possible to wrap the Zipkin HTTP server and Jaeger HTTP Server with the middleware auth that we require but it requires better understanding. It is also less obvious how it would work with the other protocols

annanay25 avatar Feb 01 '21 14:02 annanay25

Although we can try, this unfortunately this might not be fixable the way we reuse the OTEL collector receivers. I've added an example in this PR that shows how you can use multitenancy in Tempo using the OTEL collector:

https://github.com/grafana/tempo/pull/497

Note the addition of an X-Scope-OrgID in the OTEL collector. This means that all traces passing through this collector will belong to the named tenant. You can then configure the collector to accept Zipkin.

    headers:
      x-scope-orgid: authed-tenant

The tempo datasource in Grafana also has headers added to indicate the tenant:

  jsonData:
    httpHeaderName1: 'Authorization'
  secureJsonData:
    httpHeaderValue1: 'Bearer authed-tenant'

and finally an extra parameter passed to tempo-query:

    command:
    - --query.bearer-token-propagation=true  # required to pass auth to the grpc plugin

Using the bearer token is unfortunate hack in series of unfortunate hacks. @annanay25 is currently working on dropping the Jaeger-query requirement which will at least cleanup this piece. @annanay25 please remember to clean up this example when you get rid of Jaeger query.

@mnadeem Thank you for reporting this issue. I am going to change the title to reflect that the issue is that multitenancy is not supported on non-GRPC ingestion. I consider everything in this post and the linked PR a workaround. So we will leave this issue open.

joe-elliott avatar Feb 02 '21 02:02 joe-elliott

@annanay25 please remember to clean up this example when you get rid of Jaeger query.

👍 I will make sure. This work will start in coordination with the Grafana project team, probably in the next couple of weeks.

annanay25 avatar Feb 02 '21 06:02 annanay25

@annanay25 @mnadeem , I found this article to be helpful, but I am also running into the same issue even with auth.enabled set to false. In my case it is single-tenant. I do not want to enable multi tenancy.

Environment : Stand alone Tempo deployment with storage as local Multi Tenancy : No auth-enabled : false Intent : to send Jaeger traces to Tempo for the purposes of tracing from traces to Loki logs. Tempo Version : 0.7 the latest

batwork7 avatar May 11 '21 16:05 batwork7

Any help in resolving this issue would be grealy appreciated.

batwork7 avatar May 11 '21 16:05 batwork7

I believe I'm experiencing this with Tempo 1.2.1 as well. Any updates here?

jjneely avatar Dec 29 '21 22:12 jjneely

No changes here, multitenancy is only supported when ingesting gRPC, so you should use either OTLP gRPC or Jaeger gRPC in a multitenant setup.

kvrhdn avatar Dec 30 '21 14:12 kvrhdn

I'm using a proxy in front of tempo that doesn't support grpc. Is there any plans to keep the tenant ID when going over the tempo-oltp-http endpoint?

flokli avatar Jan 19 '22 16:01 flokli

We would need changes upstream to support multitenancy over this endpoint. GRPC takes all headers and automatically injects them into the context. HTTP does not.

If you have options w/r to your proxy I'd recommend envoy it is an excellent HTTP2/GRPC proxy.

joe-elliott avatar Jan 19 '22 18:01 joe-elliott

Any solution that entails setting the tenant header at the client end introduces a security flaw: all it takes is knowing the tenant header, then you can impersonate that tenant with reads and writes. The fact is clients are often independent services that send traces over external internet connections and there is a need to authenticate those clients at the gateway.

To solved this problem (in Cortex), we have the client doing mTLS auth with the ingress-nginx gateway, then nginx would set the tenant header based on the CN in the TLS certificate presented by the client. I was looking to do something similar in Tempo, this time just to find out multi-tenancy is not supported over HTTP.

GRPC over ingress-nginx is not as trivial as this demonstration makes it sond: the success of it depends on the instrumentation at the server's end. So far I've got traces making it to Tempo, but these occasional logs in Grafana Agent make me think I'm loosing some traces because connection racing conditions:

ts=2022-04-13T21:34:23Z level=error caller=exporterhelper/queued_retry_inmemory.go:93 msg="Exporting failed. No more retries left. Dropping data." component=traces traces_config=tempo kind=exporter name=otlp/0 error="max elapsed time expired rpc error: code = Unavailable desc = connection closed before server preface received" dropped_items=366

danfromtitan avatar Apr 13 '22 21:04 danfromtitan

@liguozhong suggested in https://github.com/grafana/tempo/issues/1381#issuecomment-1098877991 that we could use the Authenticator interface to extract this information out of the request coming from the OpenTelemetry receivers.

kvrhdn avatar Apr 14 '22 13:04 kvrhdn

Any solution that entails setting the tenant header at the client end introduces a security flaw: all it takes is knowing the tenant header, then you can impersonate that tenant with reads and writes. The fact is clients are often independent services that send traces over external internet connections and there is a need to authenticate those clients at the gateway.

To solved this problem (in Cortex), we have the client doing mTLS auth with the ingress-nginx gateway, then nginx would set the tenant header based on the CN in the TLS certificate presented by the client. I was looking to do something similar in Tempo, this time just to find out multi-tenancy is not supported over HTTP.

Managed to get GRPC working through ingress-nginx, the gist of it is to use grpc_set_header in a configuration-snippet to insert the X-Scope-OrgID header. (Luckily the reading path does take the tenant name in an HTTP header, otherwise Grafana doesn't seem to have a decent GRPC data source that supports client TLS authentication.)

danfromtitan avatar Apr 18 '22 19:04 danfromtitan

grpc_set_header

The grpc_set_header suggestion is great, I didn't think about it before, but I solved it temporarily by another way (fork otlp collector). https://github.com/open-telemetry/opentelemetry-collector/issues/5216 https://github.com/liguozhong/opentelemetry-collector/commit/3f6214ee68930a23a656df9ea739e6b6ef457992

liguozhong avatar Apr 19 '22 09:04 liguozhong