spin icon indicating copy to clipboard operation
spin copied to clipboard

Spin-telemetry: consider eliminating the use of `set_global_default`

Open Mossaka opened this issue 1 year ago • 4 comments

Today spin_telemetry::init() calls registry::init() to build a tracing registry subscriber. Inside of that API, it calls dispatcher::set_global_default(). According to its documentation:

Note: Libraries should NOT call set_global_default()! That will cause conflicts when executables try to set them later.

When used as a library, spin_telemetry should not call set_global_default as it might cause conflicts with executables that depends on it. E.g. the spin shim.

Alternative

  1. consider using opentelemetry tracer instrumentation instead of that of tracing.
  2. consider using set_default() for every thread that spin runtime creates (albert much more invasive to the code)

Mossaka avatar Jul 12 '24 18:07 Mossaka

Note: Libraries should NOT call set_global_default()! That will cause conflicts when executables try to set them later.

spin_telemetry::init() is meant to be called by binaries, not libraries.

  1. consider using opentelemetry tracer instrumentation instead of that of tracing.

This would only cover part of the functionality of spin-telemetry and would still normally involve setting a global.

  1. consider using set_default() for every thread that spin runtime creates (albert much more invasive to the code)

This may be plausible by hooking into tokio, but I'm not sure I understand how this would work with the shim

Could you say more about the issues you are seeing? I think we should be able to find less invasive options for shim interop.

lann avatar Jul 12 '24 18:07 lann

Could you say more about the issues you are seeing?

Yes, I had long discussions with @calebschoepp on this topic and the main issue we are seeing is that, in containerd spin shim, there are two places where set_global_default is invoked. One at the entrypoint of the shim binary, which is supposed to collect traces from shim functions. Another is within the container sandbox that initializes a spin trigger and runtime, which is to collect traces from spin runtime.

Note that the shim process is the parent of the spin runtime process, but since it's invoked by clone3() the spin runtime process inherits the global states the shim process has, including global states for the tracing collector. Hence a panic caused by the second invocation of the set_global_default.

The tricky part of this case is that both the spin shim and spin runtime processes are multi-threaded.

spin_telemetry::init() is meant to be called by binaries, not libraries.

You are right that it should be called by the binary. but in the case of spin shim, spin_telemetry::init() is invoked at the point where spin runtime is initializing. you may ask, why don't we call spin_telemetry::init() at the main() function of the spin shim binary? That's because at that moment, the binary hasn't parsed the container spec to get OTLP environment variables such as the endpoint, and it's not within the sandbox that is supposed to contain the spin runtime.

This would only cover part of the functionality of spin-telemetry and would still normally involve setting a global.

My understanding is that opentelemetry crate supports namespaced trace instrumentation, right? It should help the case I described above.

Mossaka avatar Jul 22 '24 20:07 Mossaka

My understanding is that opentelemetry crate supports namespaced trace instrumentation, right? It should help the case I described above.

Namespaced trace instrumentation would potentially help us get around setting a global, but I think switching Spin from tracing to opentelemetry is a non-starter. Beyond just producing traces we also rely on tracing in Spin for:

  • Producing metrics
  • Implementing WASI Observe

calebschoepp avatar Jul 23 '24 15:07 calebschoepp

since it's invoked by clone3() the spin runtime process inherits the global states the shim process has, including global states for the tracing collector.

Is this intentional or incidental behavior? I believe clone3 in particular doesn't require sharing memory with a child.

lann avatar Jul 23 '24 15:07 lann

I don't believe this is a relevant issue anymore. If you disagree please reopen this.

calebschoepp avatar Oct 10 '25 01:10 calebschoepp