tempo icon indicating copy to clipboard operation
tempo copied to clipboard

how to push traces from a remote application using the gateway?

Open freed-git opened this issue 4 years ago • 6 comments

Hi

I have tempo setup on kubernetes using the grafana/tempo-distributed helm (chart version 0.15.1) and (app version 1.3.0). In the values file for the chart, please note that I set the value of gateway.enabled to true and gateway.ingress.enabled to true. I'm trying to send traces from an application outside the cluster over public internet to the tempo gateway. I can't seem to find any documentation on how to do this in the python code. Can someone point me to some documentation or give some example python code on how to do this?

Any help is appreciated

below is the sample code I'm using for the python client application

import requests
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource, SERVICE_NAME
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
import logging
from opentelemetry.instrumentation.logging import LoggingInstrumentor


tracer = trace.get_tracer_provider().get_tracer(__name__)

trace.set_tracer_provider(
    TracerProvider(
        resource=Resource.create({SERVICE_NAME: "trace-cli"})
    )
)

# exporter = ConsoleSpanExporter()
exporter = JaegerExporter(
    agent_host_name="tempo.eks01.example.org",
    agent_port=443
)

trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(exporter)
)

# You can optionally pass a custom TracerProvider to instrument().
RequestsInstrumentor().instrument()
LoggingInstrumentor().instrument()

with tracer.start_as_current_span('test-span'):
    response = requests.get(url="https://compute.eks01.example.org/compute/6")
    logging.info("test")

print(response.json())

freed-git avatar Mar 02 '22 15:03 freed-git

i'll admit I'm not super familiar with the gateway option in the Tempo helm chart.

Are you seeing any errors in the logs of your application or the gateway?

joe-elliott avatar Mar 02 '22 19:03 joe-elliott

I made some progress in using the tempo gateway. I had to make changes to the JaegerExporter and use the collector_endpoint argument instead of agent_host_name and agent_port. Now I'm seeing the traces in grafana but the client (A) and server (B) spans are not nested properly. See image below. What can I do to fix that?

here's the updated code

import requests
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource, SERVICE_NAME
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
import logging
from opentelemetry.instrumentation.logging import LoggingInstrumentor


tracer = trace.get_tracer_provider().get_tracer(__name__)

trace.set_tracer_provider(
    TracerProvider(
        resource=Resource.create({SERVICE_NAME: "trace-cli"})
    )
)

# exporter = ConsoleSpanExporter()
#https://opentelemetry-python.readthedocs.io/en/latest/exporter/jaeger/jaeger.html
exporter = JaegerExporter(
    collector_endpoint="https://tempo.eks01.example.org/jaeger/api/traces",
)

trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(exporter)
)

# You can optionally pass a custom TracerProvider to instrument().
RequestsInstrumentor().instrument()
LoggingInstrumentor().instrument()

with tracer.start_as_current_span('test-span'):
    response = requests.get(url="https://compute.eks01.example.org/compute/6")
    logging.info("test")

print(response.json())
otel-trace

freed-git avatar Mar 02 '22 20:03 freed-git

so nesting looks correct to me: each span is correctly attached to its parent.

however you do have a large time gap between your trace-cli HTTP GET and the compute /compute/<int:n>. if these processes are occurring on two different machines its possible you are just seeing the different in their system clocks.

joe-elliott avatar Mar 02 '22 20:03 joe-elliott

@joe-elliott thank you for the fast reply

Indeed, trace-cli and compute are processes running on different machines. I was expecting to see trace-cli include the compute child spans with no gap. At the very least the blue lines from trace-cli should extend to incorporate the compute service since requests.get in trace-cli is a synchronous blocking call against the compute microservice.

Can this be fixed?

freed-git avatar Mar 02 '22 20:03 freed-git

Aside from making sure that all your machines are time synced, not that I'm aware of. Jaeger has the ability to adjust spans to be beneath their parents to improve the look of the trace, but it has been received with mixed opinions:

https://github.com/jaegertracing/jaeger/issues/1459

I don't believe that Grafana has any similar capability, but I will defer to @connorlindsey

joe-elliott avatar Mar 02 '22 20:03 joe-elliott

There's a github discussion around adding the ability to toggle fixing clockskew in the frontend: https://github.com/grafana/grafana/discussions/39912

At the moment, we haven't seen enough demand to warrant building this yet, but would always be open to working with external contributors to get it built sooner

connorlindsey avatar Mar 02 '22 21:03 connorlindsey

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

github-actions[bot] avatar Nov 16 '22 00:11 github-actions[bot]