opentelemetry-cpp
opentelemetry-cpp copied to clipboard
[EXAMPLES] Improve ForceFlush call to also work with NoopTracerProvider
Is your feature request related to a problem? Yes, if you copy and paste code from the examples, which most users will do, you end up with the code from cleanupTracer:
void CleanupTracer()
{
// We call ForceFlush to prevent to cancel running exportings, It's optional.
opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> provider =
trace::Provider::GetTracerProvider();
if (provider)
{
static_cast<trace_sdk::TracerProvider *>(provider.get())->ForceFlush();
}
std::shared_ptr<opentelemetry::trace::TracerProvider> none;
trace::Provider::SetTracerProvider(none);
}
Which is code that does not work when called before any initialization.
This may happen at least in unit tests of client code (happened to me and made me search for the best solution for 1h).
Also, it is better to call ForceFlush with a time argument so that people know they can use it, and you won't have issues posted like "why does my app not shutdown?".
Describe the solution you'd like I would like the example code to work in all cases, inited or not.
Describe alternatives you've considered Changing from static_cast to dynamic cast and checking if the provider exists fixes this, Also adding a 10s timeout to ForceFlush should cover typical cases gracefully.
void CleanupTracer()
{
// We call ForceFlush to prevent cancelling running exports, It's optional.
opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> provider =
trace::Provider::GetTracerProvider();
if (provider)
{
trace_sdk::TracerProvider* tracerProvider = dynamic_cast<trace_sdk::TracerProvider*>(provider.get());
if ( tracerProvider) {
// Remove arg if you want to wait forever
tracerProvider->ForceFlush(std::chrono::milliseconds(10000));
}
}
std::shared_ptr<opentelemetry::trace::TracerProvider> none;
trace::Provider::SetTracerProvider(none);
}
If you would like, I can provide the PR for the 4 examples that i found that have this code:
https://github.com/open-telemetry/opentelemetry-cpp/blob/a2323281adf60de8351c64861f313137bbbcebfd/examples/otlp/http_log_main.cc#L77
https://github.com/open-telemetry/opentelemetry-cpp/blob/a2323281adf60de8351c64861f313137bbbcebfd/examples/otlp/grpc_log_main.cc#L61
https://github.com/open-telemetry/opentelemetry-cpp/blob/a2323281adf60de8351c64861f313137bbbcebfd/examples/otlp/http_main.cc#L51
https://github.com/open-telemetry/opentelemetry-cpp/blob/a2323281adf60de8351c64861f313137bbbcebfd/examples/otlp/grpc_main.cc#L45
Make sense, but dynamic_cast could only be used when RTTI is enabled. Should we keep both the options, something like this (not tested):
void CleanupTracer() {
opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> provider =
trace::Provider::GetTracerProvider();
if (provider) {
#ifdef OPENTELEMETRY_RTTI_ENABLED
// When RTTI is enabled, use dynamic_cast for safety
trace_sdk::TracerProvider* sdkProvider = dynamic_cast<trace_sdk::TracerProvider*>(provider.get());
if (sdkProvider) {
sdkProvider->ForceFlush();
}
#else
// Without RTTI, use static_cast
// Note: This is safe only if the TracerProvider is correctly initialized as trace_sdk::TracerProvider
static_cast<trace_sdk::TracerProvider*>(provider.get())->ForceFlush();
#endif
}
std::shared_ptr<opentelemetry::trace::TracerProvider> none;
trace::Provider::SetTracerProvider(none);
}
Assigned to you @ecourreges-orange. Thanks for troubleshooting :)
Calling GetTracerProvider(), which returns an API level object, and somehow getting at the SDK provider from it, is by definition wrong.
A possible change is for the code to keep a global variable on the SDK provider used instead, and never call GetTracerProvider() from the SDK init/cleanup code in the application.
Instrumented code still can call GetTracerProvider() of course.
This issue is available for anyone to work on. Make sure to reference this issue in your pull request. :sparkles: Thank you for your contribution! :sparkles:
Calling
GetTracerProvider(), which returns an API level object, and somehow getting at the SDK provider from it, is by definition wrong.A possible change is for the code to keep a global variable on the SDK provider used instead, and never call
GetTracerProvider()from the SDK init/cleanup code in the application.Instrumented code still can call
GetTracerProvider()of course.
Agree. This would be the correct approach.
I think there is more to this bug report.
Calling ForceFlush on the NoopTracerProvider / NoopTracer should be illegal in the first place.
The API spec only defines:
- a
TracerProvider::GetTracer()operation - a
Tracer::CreateSpan()operation
there is no Flush at the API level.
opentelemetry-cpp exposed extra ForceFlush() and Close() methods on the API tracer, which is outside the specs, and not needed to instrument an application.
These methods belong to the SDK only.
For ABI_VERSION_1, removing these will be a breaking change, so the best is to just leave them.
For ABI_VERSION_2, I think ForceFlush() and Close() should be removed from the API, and only be available in the SDK.
cc @open-telemetry/cpp-approvers
Instead of having two code paths (nightmares) for static_cast and dynamic_cast - can't we use something akin to Qt's cast - https://doc.qt.io/qt-6/qobject.html#qobject_cast - it's probably more work to add this - but might be worth doing instead.
dynamic_cast also tends to get much slower with larger apps (though it doesn't look like it's called in the critical path, but may one day).
Instead of having two code paths (nightmares) for static_cast and dynamic_cast - can't we use something akin to Qt's cast - https://doc.qt.io/qt-6/qobject.html#qobject_cast - it's probably more work to add this - but might be worth doing instead.
dynamic_cast also tends to get much slower with larger apps (though it doesn't look like it's called in the critical path, but may one day).
Or do not cast at all, see PR #2664