langstream Introduce logging level for pipelines

Currently most of the agents are composable. When agents are composed into a single processor, they don't write to the topic but the record is still manipulated in memory. This creates a visibility issue about each step of the processor. It's very hard to understand how the computation goes if there's some misconfiguration or an edge case.

Most of the time, both in production (with poisoned messages) and during development (normal messages), the first thing you want to do is know more about each step to find the root cause.

One solution to this would be to introduce a agent-level logging whereas both the runtime framework and agent implementation can leverage on.

Since the execution plan cannot be guess, the logging level configuration must be done with pipeline granularity.

pipeline: xxx
topics: []
log: info | debug  
...

The first implementation would be the composite agent to log the record at each inner step. In production with tons of data this could be disruptive for latency so it has to be taken with caution.

An alternative would be to have debug topic for each step but this is extremely expensive and hard to turn on/off

Oct 06 '23 07:10 nicoloboschi

I am not sure that dumping the records is good, as you say in the proposal it may be disruptive.

In some initial discussions about this problem we (with @dlg99 ) said it would be good to have to ability to copy the logs of the agent to a topic, and maybe also the records.

I think that it is better to tackle specific problems and find a good answer.

What's the problem you want to solve? See what's going in and out of an agent? Or debug the processing?

Oct 06 '23 08:10 eolivelli

The main pain point is that I want to see what is in the record at each step. the focus is on development of the pipeline. I created a pipeline with X steps that performs structure manipulation (add field, get query result, compute). Ideally I write the pipeline once (with end-to-end topics, no intermediate) and I just need to debug each step without changing every step. Writing to a intermediate topic is expensive and it changes the execution plan (if we don't add specific support), therefore I can't even update the application.

With this solution it's just a turn on/off and I can update the existing application. I know this could become a double-edged sword.

But users are used to logging levels and normally they know that running with debug on kills performance.

Another solution would be to being able to write the debug both to logs and topics.

debug: 
  enabled: true|false
  topic: xxx # same topic for all agents, fixed json structure {from-agent: xx, source-record: xxx, record: yyy}
  log: true|false

Oct 06 '23 08:10 nicoloboschi

Let's see this from another point of view. Let's now ask users to change the pipeline files.

What about adding some runtime configuration for the application that you can turn on and off without chaing the pipeline.

langstream apps configure APP --executor xx -- log- input true -- topic TOPICNAME

langstream apps configure APP --forward-logs -- level INFO --topic TOPICNAME

Oct 06 '23 09:10 eolivelli

Please note that the control plane is able to issue commands to the pads (as there is an http endpoint, already used for info) and we can enable runtime debugging an features without restartng the pods

Oct 06 '23 09:10 eolivelli