opentelemetry-cpp icon indicating copy to clipboard operation
opentelemetry-cpp copied to clipboard

[EXAMPLES] Improve ForceFlush call to also work with NoopTracerProvider

Open ecourreges-orange opened this issue 1 year ago • 7 comments

Is your feature request related to a problem? Yes, if you copy and paste code from the examples, which most users will do, you end up with the code from cleanupTracer:

void CleanupTracer()
{
  // We call ForceFlush to prevent to cancel running exportings, It's optional.
  opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> provider =
      trace::Provider::GetTracerProvider();
  if (provider)
  {
    static_cast<trace_sdk::TracerProvider *>(provider.get())->ForceFlush();
  }

  std::shared_ptr<opentelemetry::trace::TracerProvider> none;
  trace::Provider::SetTracerProvider(none);
}

Which is code that does not work when called before any initialization.
This may happen at least in unit tests of client code (happened to me and made me search for the best solution for 1h).
Also, it is better to call ForceFlush with a time argument so that people know they can use it, and you won't have issues posted like "why does my app not shutdown?".

Describe the solution you'd like I would like the example code to work in all cases, inited or not.

Describe alternatives you've considered Changing from static_cast to dynamic cast and checking if the provider exists fixes this, Also adding a 10s timeout to ForceFlush should cover typical cases gracefully.

void CleanupTracer()
{
  // We call ForceFlush to prevent cancelling running exports, It's optional.
  opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> provider =
      trace::Provider::GetTracerProvider();
    if (provider)
    {
        trace_sdk::TracerProvider* tracerProvider = dynamic_cast<trace_sdk::TracerProvider*>(provider.get());
        if ( tracerProvider) {
            // Remove arg if you want to wait forever
            tracerProvider->ForceFlush(std::chrono::milliseconds(10000));
        }
    }
  std::shared_ptr<opentelemetry::trace::TracerProvider> none;
  trace::Provider::SetTracerProvider(none);
}

If you would like, I can provide the PR for the 4 examples that i found that have this code: https://github.com/open-telemetry/opentelemetry-cpp/blob/a2323281adf60de8351c64861f313137bbbcebfd/examples/otlp/http_log_main.cc#L77
https://github.com/open-telemetry/opentelemetry-cpp/blob/a2323281adf60de8351c64861f313137bbbcebfd/examples/otlp/grpc_log_main.cc#L61
https://github.com/open-telemetry/opentelemetry-cpp/blob/a2323281adf60de8351c64861f313137bbbcebfd/examples/otlp/http_main.cc#L51 https://github.com/open-telemetry/opentelemetry-cpp/blob/a2323281adf60de8351c64861f313137bbbcebfd/examples/otlp/grpc_main.cc#L45

ecourreges-orange avatar Jan 23 '24 13:01 ecourreges-orange

Make sense, but dynamic_cast could only be used when RTTI is enabled. Should we keep both the options, something like this (not tested):

void CleanupTracer() {
    opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> provider =
        trace::Provider::GetTracerProvider();

    if (provider) {
#ifdef OPENTELEMETRY_RTTI_ENABLED
        // When RTTI is enabled, use dynamic_cast for safety
        trace_sdk::TracerProvider* sdkProvider = dynamic_cast<trace_sdk::TracerProvider*>(provider.get());
        if (sdkProvider) {
            sdkProvider->ForceFlush();
        }
#else
        // Without RTTI, use static_cast
        // Note: This is safe only if the TracerProvider is correctly initialized as trace_sdk::TracerProvider
        static_cast<trace_sdk::TracerProvider*>(provider.get())->ForceFlush();
#endif
    }

    std::shared_ptr<opentelemetry::trace::TracerProvider> none;
    trace::Provider::SetTracerProvider(none);
}
   

Assigned to you @ecourreges-orange. Thanks for troubleshooting :)

lalitb avatar Jan 24 '24 08:01 lalitb

Calling GetTracerProvider(), which returns an API level object, and somehow getting at the SDK provider from it, is by definition wrong.

A possible change is for the code to keep a global variable on the SDK provider used instead, and never call GetTracerProvider() from the SDK init/cleanup code in the application.

Instrumented code still can call GetTracerProvider() of course.

marcalff avatar Jan 24 '24 11:01 marcalff

This issue is available for anyone to work on. Make sure to reference this issue in your pull request. :sparkles: Thank you for your contribution! :sparkles:

github-actions[bot] avatar Jan 24 '24 11:01 github-actions[bot]

Calling GetTracerProvider(), which returns an API level object, and somehow getting at the SDK provider from it, is by definition wrong.

A possible change is for the code to keep a global variable on the SDK provider used instead, and never call GetTracerProvider() from the SDK init/cleanup code in the application.

Instrumented code still can call GetTracerProvider() of course.

Agree. This would be the correct approach.

lalitb avatar Jan 24 '24 13:01 lalitb

I think there is more to this bug report.

Calling ForceFlush on the NoopTracerProvider / NoopTracer should be illegal in the first place.

The API spec only defines:

  • a TracerProvider::GetTracer() operation
  • a Tracer::CreateSpan() operation

there is no Flush at the API level.

opentelemetry-cpp exposed extra ForceFlush() and Close() methods on the API tracer, which is outside the specs, and not needed to instrument an application.

These methods belong to the SDK only.

For ABI_VERSION_1, removing these will be a breaking change, so the best is to just leave them.

For ABI_VERSION_2, I think ForceFlush() and Close() should be removed from the API, and only be available in the SDK.

cc @open-telemetry/cpp-approvers

marcalff avatar Mar 14 '24 10:03 marcalff

Instead of having two code paths (nightmares) for static_cast and dynamic_cast - can't we use something akin to Qt's cast - https://doc.qt.io/qt-6/qobject.html#qobject_cast - it's probably more work to add this - but might be worth doing instead.

dynamic_cast also tends to get much slower with larger apps (though it doesn't look like it's called in the critical path, but may one day).

malkia avatar May 09 '24 18:05 malkia

Instead of having two code paths (nightmares) for static_cast and dynamic_cast - can't we use something akin to Qt's cast - https://doc.qt.io/qt-6/qobject.html#qobject_cast - it's probably more work to add this - but might be worth doing instead.

dynamic_cast also tends to get much slower with larger apps (though it doesn't look like it's called in the critical path, but may one day).

Or do not cast at all, see PR #2664

marcalff avatar May 10 '24 07:05 marcalff