dbos-transact-py icon indicating copy to clipboard operation
dbos-transact-py copied to clipboard

Make child workflows create OpenTelemetry child spans

Open qianl15 opened this issue 6 months ago • 2 comments

Copy the original discussions from https://github.com/dbos-inc/dbos-transact-py/pull/317 (@dbmikus)

Make child workflows create OpenTelemetry child spans so that you can track execution across sub-workflows.

Testing:

  • tested running against a local workflow that creates child works

Issues to fix:

  • Queue("...").enqueue_async workflows do not create child spans

When I create child workflows via like so:

myqueue = Queue("myqueue", concurrency=25)

@DBOS.workflow()
async def wf():
    await myqueue.enqueue_async(sub_wf)

@DBOS.workflow()
async def sub_wf():
    pass

the sub_wf workflows show up as new traces.

It would be useful to use standard OTel tooling for tracking workflows in DBOS, if possible.

I understand that there are pain points with OTel and very long-running traces. I've previously put traces on Kafka and SQS, but those were consumed relatively quickly. TBH, I'm not sure of the ramifications of having a trace that can exist for days. There might be no problems, or it might break OTel collection. There are ways to link two traces together, which might alleviate long-lived trace problems.

Another simpler solution is to make child workflows exist within the same trace as long as they are not executed on a different executor.

For context, we use OpenTelemetry for observability and sometimes data collection of our LLMs. We record some function inputs/outputs on OTel spans and also record log messages in the spans. Being able to debug the LLM flow across spans is very helpful, and other LLM ops products support ingesting OTel traces.

qianl15 avatar Apr 28 '25 18:04 qianl15

Copied from the original discussion.

Thanks for the question! Right now, enqueue doesn’t create child spans because the enqueued tasks are often executed asynchronously on a different executor or machine — sometimes even hours later, long after the parent workflow has finished. In these cases, we won't be able to directly pass in a parent span in memory and creating child spans could lead to confusing or misleading traces.

That said, we agree it would be useful to track execution across parent and child workflows in some scenarios. Our team has been discussing potential solutions, including persisting and populating the span ID through Postgres. Definitely something we're actively thinking about.


Update:

Do steps / child-workflows only execute on a different executor when using queues?

Steps/child workflows may execute on a different executor when 1) using queues or 2) recovering from a crash on the original executor. In both cases, we currently create a new span and treat them as a new execution. Therefore, to properly propagate spans we'll need to persist them in the database.

For now, we've made DBOS.tracer available in the public interface (https://github.com/dbos-inc/dbos-transact-py/pull/306), so you may add your own tracing spans in your functions.

qianl15 avatar Apr 28 '25 18:04 qianl15

Quick question about this:

For now, we've made DBOS.tracer available in the public interface (https://github.com/dbos-inc/dbos-transact-py/pull/306), so you may add your own tracing spans in your functions.

Is there any way to use this to make sub-workflows appear as spans within my parent workflow trace, when I directly call those sub-workflows?

My example workflow code:

@DBOS.workflow()
async def wf():
    await sub_wf()

@DBOS.workflow()
async def sub_wf():
    pass

I believe DBOS will create a new trace when I call sub_wf.

dbmikus avatar Apr 29 '25 17:04 dbmikus