opentelemetry-dotnet icon indicating copy to clipboard operation
opentelemetry-dotnet copied to clipboard

How do we model activities that could start and end in different processes?

Open aelij opened this issue 4 years ago • 3 comments

Describe your environment. We are using distributed traces in a workflow system.

What are you trying to achieve? We'd like the entire workflow to have an encompassing activity. Due to the nature of a distributed workflow, this activity could potentially start and stop on different machines, depending on workflow recovery options. AFAIK there's no way to use the Activity class like this - it has to start and stop in a single process. How about adding a method like ActivitySource.FromExisting(string id, DateTime startTime) to resume an existing activity?

aelij avatar Dec 23 '21 12:12 aelij

We need to follow the specification regarding span creation https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span-creation - instantiating or acquiring an activity with a specific id is not part of the current specification.

That said, though, Activity is not really designed in a way that would make starting and ending one in a different process very easy (or perhaps not possible).

I'm curious what a single activity encompassing a workflow gets you? Why not represent the work of each process as a separate span within the same trace?

alanwest avatar Dec 28 '21 22:12 alanwest

I'm curious what a single activity encompassing a workflow gets you? Why not represent the work of each process as a separate span within the same trace?

Consider a system like Durable Task. Each orchestration represents a series of calls to activities. For example, video encoding has an Encode and an Email activity, with the orchestration engine responsible for managing the workflow as whole. The workflow can be stopped at any point (e.g. a machine crashing), and the engine would restart and only executed the incomplete steps. To me it makes a lot of sense for this whole thing to be represented by a parent span, where each activity has a child span. Otherwise, if you run 2 workflows within another context (e.g. a REST call) you would not know which activities belong to which workflow.

Activity is not really designed in a way that would make starting and ending one in a different process very easy (or perhaps not possible).

I agree it's not an easy to implement scenario but it is possible. I've worked around this limitation by setting the _traceId and _spanId fields, and then calling Stop (without Start) - it seems to works well.

aelij avatar Dec 29 '21 09:12 aelij

We use a similar asynchronous job/workflow processing system for our SaaS product. The system may process a single job in multiple iterations, each of which may be scheduled to different worker processes. OpenTelemetry spec not supporting this use case was quite an unpleasant surprise.

semyon2105 avatar May 23 '22 13:05 semyon2105