rig icon indicating copy to clipboard operation
rig copied to clipboard

feat: improving observability & tracing

Open 0xMochan opened this issue 9 months ago • 3 comments

  • [x] I have looked for existing issues (including closed) about this

Feature Request

Standardize and improve tracing calls to a uniform design universal for any provider.

Motivation

Debugging rig agents especially with complex routing is a pain. Where things are traced and handled is inconsistent amongst most models in formatting and there isn't any standardization.

Observability is an important facet that serious developers will need in order to grow and scale a platform on rig. If a buisness has deployed personalized agents for their clients, it'll be essentially to be able to react and study issues that occur with agents, pulling up logs, and being able to rely on rig for consistent observability for rapid diagnosis.

Proposal

  • all custom tracing logic per provider should be canned unless it's specific (like rig-eternalai's custom prompt logic)
  • generic agent tracing should follow a specific pattern and provide extra data in a form that allows for observability
  • consider integration with a platform (like logfire)

Alternatives

  • simplified tracing w/ observability at a future date

0xMochan avatar Apr 08 '25 20:04 0xMochan

Looks like logfire just uses OTel under the hood. I'll have a look and see what potential things we might need to uncover beforehand, as I've worked with tracing-opentelemetry before but not entirely sure if we need to create our own subscriber.

edit: So following on from this, it doesn't look too bad. I think we can probably get by with just starting out with good tracing messages for this and proper key values (for things like what model is being used, the provider, tokens used, whether streaming is used, whether the call was successful, etc...).

joshua-mo-143 avatar Apr 08 '25 21:04 joshua-mo-143

Looks like logfire just uses OTel under the hood. I'll have a look and see what potential things we might need to uncover beforehand, as I've worked with tracing-opentelemetry before but not entirely sure if we need to create our own subscriber.

edit: So following on from this, it doesn't look too bad. I think we can probably get by with just starting out with good tracing messages for this and proper key values (for things like what model is being used, the provider, tokens used, whether streaming is used, whether the call was successful, etc...).

Yea, there's no reason to directly use the logfire crate in the library, i think u can use the platform with the normal tracing crate (i have shiny tool syndrome). I think we can use the with just tracing itself actually.


Stopher and I talked about that good tracing isn't just implementing tracing but also re-organizing aspects like providers that automatically get tracing. Like adding CompletionRequest modeling for each provider so that we can just require From impls so that impl CompletionModel is just setting assoc. types which auto-gen everything including good code-gen.

0xMochan avatar Apr 18 '25 18:04 0xMochan