text-generation-inference TraceID listed in server logs cannot be found in Grafana Tempo

System Info

HF-TGI server running on Kubernetes, I executed text-generation-launcher --env inside the pod:

2023-07-12T12:58:48.739266Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.70.0
Commit sha: 31b36cca21fcd0e6b7db477a7545063e1b860156
Docker label: sha-31b36cc
nvidia-smi:
Wed Jul 12 12:58:48 2023       
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |                               |                      |               MIG M. |
   |===============================+======================+======================|
   |   0  NVIDIA A100 80G...  On   | 00000001:00:00.0 Off |                    0 |
   | N/A   35C    P0    71W / 300W |  46936MiB / 81920MiB |      0%      Default |
   |                               |                      |             Disabled |
   +-------------------------------+----------------------+----------------------+
                                                                                  
   +-----------------------------------------------------------------------------+
   | Processes:                                                                  |
   |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
   |        ID   ID                                                   Usage      |
   |=============================================================================|
   +-----------------------------------------------------------------------------+
2023-07-12T12:58:48.739312Z  INFO text_generation_launcher: Args { model_id: "OpenAssistant/falcon-40b-sft-mix-1226", revision: None, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), dtype: None, trust_remote_code: false, max_concurrent_requests: 512, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "production-hf-text-generation-inference-6594cb8f5d-z4mdf", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: Some("tempo.monitoring:4317"), cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: true }

Model being used:

{
  "model_id": "OpenAssistant/falcon-40b-sft-mix-1226",
  "model_sha": "9ac6b7846fabe144646213cf1c6ee048b88272a7",
  "model_dtype": "torch.float16",
  "model_device_type": "cuda",
  "model_pipeline_tag": "text-generation",
  "max_concurrent_requests": 512,
  "max_best_of": 2,
  "max_stop_sequences": 4,
  "max_input_length": 1024,
  "max_total_tokens": 2048,
  "waiting_served_ratio": 1.2,
  "max_batch_total_tokens": 16000,
  "max_waiting_tokens": 20,
  "validation_workers": 2,
  "version": "0.9.1",
  "sha": "31b36cca21fcd0e6b7db477a7545063e1b860156",
  "docker_label": "sha-31b36cc"
}

Hardware used (GPUs, how many, on which cloud) (nvidia-smi): nvidia-smi see above, runs on an Azure Kubernetes Cluster VM spec: Standard_NC24ads_A100_v4

Deployment specificities (Kubernetes, EKS, AKS, any particular deployments): Runs on an AKS and is installed through a helm chart The current version being used: 0.9.1

Information

[ ] Docker
[ ] The CLI directly

Tasks

[ ] An officially supported command
[ ] My own modifications

Reproduction

I installed the HF TGI server, Prometheus, Grafana, Loki, and Grafana Tempo on Kubernetes. The latter four are in namespace monitoring and the HF TGI server is in namespace hf-tgi. HF-TGI is created with the following environment variable set: OTLP_ENDPOINT: "tempo.monitoring:4317", i.e. it references the service tempo in namespace monitoring on port 4317. Service is up and running.

So far this works fine, in Grafana under "Explore", I can select "Tempo", click on "Search" and Run Query. It then finds a lot of traces, mostly from target /Health, sometimes /Decode:

Now, when I go to "Explore", select "Loki", and then query the logs from the HF TGI pod, I can see the info messages like in stdout on the server itself. In the messages there is an entry in the JSON called "spans[0].trace_id". When I use the value from that field and search that in "Explore" -> "Tempo" -> TraceQL, I get an error message that the trace was not found: failed to get trace with id: XXXX Status: 404 Not Found Body: trace not found

Expected behavior

My expected behavior would be: TraceIDs listed in the info messages on the server should point to a trace that exists.

However, I am new to tracing (and the Prometheus-Grafana-etc. stack) so my question is also if I am misconfiguring something here. I think it is a bug because I can see some traces but the TraceID from the info log cannot be found.

Jul 12 '23 13:07 pdeubel

It's possible that you are just missing the http:// header: OTLP_ENDPOINT: http://tempo.monitoring:4317. The traces you see are from the Python server, but it doesn't seem to collect the traces from the rust webserver (which is the one displaying the trace ID in the logs).

Jul 12 '23 14:07 OlivierDehaene

Many thanks for the fast response, that worked I now get the traces from the rust webserver. However when searching the TraceIDs from the logs in Tempo I still get 404 errors. On the other hand, when exploring all traces I can find the POST /generate span and a trace for. But the TraceIDs there are different to the one from the logs.

For example, this is from the log of a request:

That TraceID cannot be found

However the corresponding trace that I found under "Explore" -> "Tempo" has a different ID (the second one is from another request):

Is that still a misconfiguration on my part?

Jul 12 '23 15:07 pdeubel

Sorry to bother again but I did not resolve this. I still get 404 errors on the Trace IDs that I find in the log. Do you have any advice on what I could try?

Aug 07 '23 13:08 pdeubel

So I did some more digging and by enabling log_received_spans and log_received_spans.include_all_attributes in Grafana Tempo I logged what spans arrive in Tempo. I extracted the corresponding log entry for a call to /generate and it seems like there are three IDs, span_trace_id, spanid, and traceid:

level=info ts=2023-09-05T12:49:30.2719584Z caller=distributor.go:591 span_service_name=text-generation-
inference.router span_http_target=/health span_http_user_agent=kube-probe/1.26 
span_trace_id=977145f925d16e8235839a5b6ed18e83 span_http_status_code=200 span_busy_ns=403196 
span_code_lineno=161 span_thread_name=tokio-runtime-worker span_http_route=/health span_idle_ns=2811510 
span_http_flavor=1.1 span_code_filepath=/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/axum-tracing-
opentelemetry-0.10.0/src/middleware/trace_extractor.rs span_http_method=GET 
span_code_namespace=axum_tracing_opentelemetry::middleware::trace_extractor span_http_host=X 
span_http_scheme=HTTP span_thread_id=14 span_http_client_ip= span_name="GET /health" 
span_duration_seconds=0.003230504 span_kind=SPAN_KIND_SERVER span_status=STATUS_CODE_OK msg=received
spanid=5e65f3ea6125c531 traceid=8b6a7418f11718fbc86cff5cbdf60e81

span_trace_id is the only one that is displayed in the logs of TGI and this is the one that I cannot find in Grafana when I search the Tempo datasource. spanid and traceid are visible in Grafana, so they work, but have no connection to span_trace_id.

I tried adding some formatting or rather extraction to your router code to include spanid and traceid in the logs but I did not get very far since I never programmed in Rust. I would be happy if you could implement something that explicitly includes spanid and traceid in the TGI logs. I found some people that already tried doing that using telemetry and tokio: https://github.com/tokio-rs/tracing/issues/1531

Other than that I will try to figure out if Grafana and/or Tempo can be configured to extract the correct IDs since it gets all three IDs as seen in the log, or I switch from Tempo to Jaeger which might not have this problem.

Sep 05 '23 16:09 pdeubel

That's very interesting thanks for sharing. It seems to be a bug in the code that generates the traceID. We never run into this issue on our side (since we always receive a traceID inside the header of the request) so I'm not sure where the bug is exactly. I will create a PR with a fix once I find it.

Sep 06 '23 11:09 OlivierDehaene

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

May 02 '24 01:05 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

TraceID listed in server logs cannot be found in Grafana Tempo

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard