text-generation-inference
text-generation-inference copied to clipboard
TraceID listed in server logs cannot be found in Grafana Tempo
System Info
HF-TGI server running on Kubernetes, I executed text-generation-launcher --env
inside the pod:
2023-07-12T12:58:48.739266Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.70.0
Commit sha: 31b36cca21fcd0e6b7db477a7545063e1b860156
Docker label: sha-31b36cc
nvidia-smi:
Wed Jul 12 12:58:48 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G... On | 00000001:00:00.0 Off | 0 |
| N/A 35C P0 71W / 300W | 46936MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
2023-07-12T12:58:48.739312Z INFO text_generation_launcher: Args { model_id: "OpenAssistant/falcon-40b-sft-mix-1226", revision: None, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), dtype: None, trust_remote_code: false, max_concurrent_requests: 512, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "production-hf-text-generation-inference-6594cb8f5d-z4mdf", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: Some("tempo.monitoring:4317"), cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: true }
Model being used:
{
"model_id": "OpenAssistant/falcon-40b-sft-mix-1226",
"model_sha": "9ac6b7846fabe144646213cf1c6ee048b88272a7",
"model_dtype": "torch.float16",
"model_device_type": "cuda",
"model_pipeline_tag": "text-generation",
"max_concurrent_requests": 512,
"max_best_of": 2,
"max_stop_sequences": 4,
"max_input_length": 1024,
"max_total_tokens": 2048,
"waiting_served_ratio": 1.2,
"max_batch_total_tokens": 16000,
"max_waiting_tokens": 20,
"validation_workers": 2,
"version": "0.9.1",
"sha": "31b36cca21fcd0e6b7db477a7545063e1b860156",
"docker_label": "sha-31b36cc"
}
Hardware used (GPUs, how many, on which cloud) (nvidia-smi): nvidia-smi see above, runs on an Azure Kubernetes Cluster VM spec: Standard_NC24ads_A100_v4
Deployment specificities (Kubernetes, EKS, AKS, any particular deployments): Runs on an AKS and is installed through a helm chart The current version being used: 0.9.1
Information
- [ ] Docker
- [ ] The CLI directly
Tasks
- [ ] An officially supported command
- [ ] My own modifications
Reproduction
I installed the HF TGI server, Prometheus, Grafana, Loki, and Grafana Tempo on Kubernetes. The latter four are in namespace monitoring
and the HF TGI server is in namespace hf-tgi
. HF-TGI is created with the following environment variable set: OTLP_ENDPOINT: "tempo.monitoring:4317"
, i.e. it references the service tempo
in namespace monitoring
on port 4317
. Service is up and running.
So far this works fine, in Grafana under "Explore", I can select "Tempo", click on "Search" and Run Query. It then finds a lot of traces, mostly from target /Health, sometimes /Decode:
Now, when I go to "Explore", select "Loki", and then query the logs from the HF TGI pod, I can see the info messages like in stdout on the server itself. In the messages there is an entry in the JSON called "spans[0].trace_id". When I use the value from that field and search that in "Explore" -> "Tempo" -> TraceQL, I get an error message that the trace was not found:
failed to get trace with id: XXXX Status: 404 Not Found Body: trace not found
Expected behavior
My expected behavior would be: TraceIDs listed in the info messages on the server should point to a trace that exists.
However, I am new to tracing (and the Prometheus-Grafana-etc. stack) so my question is also if I am misconfiguring something here. I think it is a bug because I can see some traces but the TraceID from the info log cannot be found.
It's possible that you are just missing the http:// header: OTLP_ENDPOINT: http://tempo.monitoring:4317
.
The traces you see are from the Python server, but it doesn't seem to collect the traces from the rust webserver (which is the one displaying the trace ID in the logs).
Many thanks for the fast response, that worked I now get the traces from the rust webserver. However when searching the TraceIDs from the logs in Tempo I still get 404 errors. On the other hand, when exploring all traces I can find the POST /generate
span and a trace for. But the TraceIDs there are different to the one from the logs.
For example, this is from the log of a request:
That TraceID cannot be found
However the corresponding trace that I found under "Explore" -> "Tempo" has a different ID (the second one is from another request):
Is that still a misconfiguration on my part?
Sorry to bother again but I did not resolve this. I still get 404 errors on the Trace IDs that I find in the log. Do you have any advice on what I could try?
So I did some more digging and by enabling log_received_spans
and log_received_spans.include_all_attributes
in Grafana Tempo I logged what spans arrive in Tempo. I extracted the corresponding log entry for a call to /generate
and it seems like there are three IDs, span_trace_id
, spanid
, and traceid
:
level=info ts=2023-09-05T12:49:30.2719584Z caller=distributor.go:591 span_service_name=text-generation-
inference.router span_http_target=/health span_http_user_agent=kube-probe/1.26
span_trace_id=977145f925d16e8235839a5b6ed18e83 span_http_status_code=200 span_busy_ns=403196
span_code_lineno=161 span_thread_name=tokio-runtime-worker span_http_route=/health span_idle_ns=2811510
span_http_flavor=1.1 span_code_filepath=/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/axum-tracing-
opentelemetry-0.10.0/src/middleware/trace_extractor.rs span_http_method=GET
span_code_namespace=axum_tracing_opentelemetry::middleware::trace_extractor span_http_host=X
span_http_scheme=HTTP span_thread_id=14 span_http_client_ip= span_name="GET /health"
span_duration_seconds=0.003230504 span_kind=SPAN_KIND_SERVER span_status=STATUS_CODE_OK msg=received
spanid=5e65f3ea6125c531 traceid=8b6a7418f11718fbc86cff5cbdf60e81
span_trace_id
is the only one that is displayed in the logs of TGI and this is the one that I cannot find in Grafana when I search the Tempo datasource. spanid
and traceid
are visible in Grafana, so they work, but have no connection to span_trace_id
.
I tried adding some formatting or rather extraction to your router code to include spanid
and traceid
in the logs but I did not get very far since I never programmed in Rust. I would be happy if you could implement something that explicitly includes spanid
and traceid
in the TGI logs. I found some people that already tried doing that using telemetry and tokio: https://github.com/tokio-rs/tracing/issues/1531
Other than that I will try to figure out if Grafana and/or Tempo can be configured to extract the correct IDs since it gets all three IDs as seen in the log, or I switch from Tempo to Jaeger which might not have this problem.
That's very interesting thanks for sharing. It seems to be a bug in the code that generates the traceID. We never run into this issue on our side (since we always receive a traceID inside the header of the request) so I'm not sure where the bug is exactly. I will create a PR with a fix once I find it.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.