Expose traces via Open Telemetry to allow for distributed tracing for production use cases
Feature Request
If this is a feature request, please fill out the following form in full:
Describe the problem the feature is intended to solve
Running TF Serving at scale often requires debugging of latency through the system. When latency metrics are exposed with fixed buckets, the granular details can be lost (e.g. overhead due to poor tuning of the system). Providing traces via open telemetry can help increase the observability of the system as part of a larger architecture. This observability is extra critical in the domain of ML when payload sizes can be large and in recommender systems where RPS is high and target latencies are low.
Describe the solution
Expose traces of the higher level functions of TF serving via an open source standard such as OpenTelemetry.
Describe alternatives you've considered
There are no alternatives that will give the same observability without requiring us to maintain a separate fork of this repository. We already use the existing TF Serving metrics exposed and the Tensorboard profiler. Furthermore, Tensorboard profiling is not in a consumable format for tracing SASS.
Additional context
Related to: https://github.com/tensorflow/serving/issues/1955
Is there any update on this request (now >18 months old). Another Google project, kubernetes, released OpenTelemetry tracing support over a year ago https://kubernetes.io/blog/2022/12/01/runtime-observability-opentelemetry/.
It would be nice to know if there are any timelines/plans to add similar support to Tensorflow.