opentelemetry-rust icon indicating copy to clipboard operation
opentelemetry-rust copied to clipboard

opentelemetry-zipkin 0.28.0 no longer exporting successfully - seems to stall at connection stage

Open jpoisso opened this issue 1 year ago • 9 comments

Hello!

I have an application deployed in production that uses Actix Web along with Reqwest and opentelemetry to generate and export traces.

The current deployment works just fine on 0.27 but after migrating to 0.28 and adapting the signature, the export portion no longer completes and it hangs on API requests.

actix-web = { version = "4.9.0", features = ["actix-tls", "rustls-0_23", "default"], default-features = false }
actix-web-prometheus = { version = "0.1.2", features = ["process"], default-features = false }
actix-web-opentelemetry = { version = "0.21.0", features = [ "metrics", "metrics-prometheus"], default-features = false }
actix-web-codegen = { version = "4.3.0", features = ["default"], default-features = false }
actix-cors = { version = "0.7.0", default-features = false }
actix-tls =  { version = "3.4.0", features = ["rustls-0_23", "accept", "connect", "uri"], default-features = false }
actix-files = { version = "0.6.6", default-features = false }
reqwest = { version = "0.12.12", features = ["rustls-tls", "socks", "json", "stream", "http2", "h2", "cookies", "brotli", "multipart", "charset", "deflate", "gzip", "zstd"], default-features = false }
reqwest-middleware = { version = "0.4.1", features = ["rustls-tls", "http2", "json", "charset", "multipart"], default-features = false }
reqwest-tracing = { version = "0.5.6", features = ["opentelemetry_0_28"], default-features = false  }
rustls = { version = "0.23.23", features=["default", "brotli", "hashbrown", "prefer-post-quantum", "read_buf", "zlib", "logging", "tls12", "aws-lc-rs"], default-features = false }
rustls-pemfile = { version = "2.2.0", features=["std"], default-features = false }
opentelemetry = { version = "0.28.0", features = ["default"], default-features = false}
opentelemetry_sdk = { version = "0.28.0", features = ["rt-tokio", "tokio", "trace", "logs"], default-features = false}
opentelemetry-zipkin = { version = "0.28.0", features = ["reqwest-client"], default-features = false }
opentelemetry-http  = { version = "0.28.0", features = ["reqwest"], default-features = false}
global::set_text_map_propagator(opentelemetry_zipkin::Propagator::new());

// let zipkin_pipeline = opentelemetry_zipkin::new_pipeline()
//     .with_service_name(app_name)
//     .with_collector_endpoint(url)
//     .with_trace_config(Config::default())
//     .with_http_client(http_client)
//     .install_batch(opentelemetry_sdk::runtime::Tokio)
//     .map_err(|e| {
//         AppError::ServerError(format!("failed to setup zipkin pipeline: {e:?}."))
//     })?;

let exporter = ZipkinExporter::builder()
    .with_collector_endpoint(url)
    .with_http_client(http_client)
    .build()
    .map_err(|e| AppError::ServerError(format!("failed to setup zipkin exporter: {e:?}.")))?;

let provider = SdkTracerProvider::builder()
 .with_simple_exporter(exporter)
 .with_sampler(Sampler::AlwaysOn)
 .with_resource(
     Resource::builder_empty()
         .with_service_name(app_name)
         .build(),
 )
 .with_id_generator(RandomIdGenerator::default())
 .build();

I tried hitting our deployed Zipkin instance which uses TLS, and also a local docker instance (HTTP), both have the same behavior.

I tried with and without our custom Reqwest client as well.

2025-03-03T14:32:42.015017Z TRACE hyper_util::client::legacy::pool: checkout waiting for idle connection: ("http", 127.0.0.1:9411)
2025-03-03T14:32:42.015072Z TRACE hyper_util::client::legacy::connect::http: Http::connect; scheme=Some("http"), host=Some("127.0.0.1"), port=Some(Port(9411))
2025-03-03T14:32:42.015105Z DEBUG hyper_util::client::legacy::connect::http: connecting to 127.0.0.1:9411

It stalls at the hyper_util::client::legacy::connect::http: connecting to 127.0.0.1:9411 stage and locks up.

Has anyone else encountered this?

jpoisso avatar Mar 03 '25 14:03 jpoisso

if I switch to .with_batch_exporter(..), the requests complete but then I see an internal panic occur:

thread 'OpenTelemetry.Traces.BatchProcessor' panicked at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hyper-util-0.1.10/src/client/legacy/connect/dns.rs:122:24:
there is no reactor running, must be called from the context of a Tokio 1.x runtime

jpoisso avatar Mar 03 '25 15:03 jpoisso

@jpoisso Switch to reqwest-blocking to fix the issue with BatchProcessor. (it should be the default already in the new version)

(The docs need major improvements to avoid users hitting these issues :( We have spend some time on OTLP Exporter, so they may be in better shape. Zipkin is a lower priority, and docs are even worse :( )

cijothomas avatar Mar 03 '25 17:03 cijothomas

opentelemetry-zipkin = { version = "0.28.0", features = ["reqwest-blocking-client"], default-features = false }
        let provider = SdkTracerProvider::builder()
            .with_batch_exporter(exporter)
            // .with_simple_exporter(exporter)
            .with_resource(
                Resource::builder()
                    .with_service_name(app_name)
                    .build(),
            )
            .with_sampler(Sampler::AlwaysOn)
            .build();

if I use the blocking request client along with actix-web, I get this error:

Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context. stack backtrace: 0: rust_begin_unwind at /rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:692:5 1: core::panicking::panic_fmt at /rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/panicking.rs:75:14 2: tokio::runtime::blocking::shutdown::Receiver::wait at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/blocking/shutdown.rs:51:21 3: tokio::runtime::blocking::pool::BlockingPool::shutdown at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/blocking/pool.rs:263:12 4: <tokio::runtime::blocking::pool::BlockingPool as core::ops::drop::Drop>::drop at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/blocking/pool.rs:284:9 5: core::ptr::drop_in_placetokio::runtime::blocking::pool::BlockingPool at /home/jonathan/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:523:1 6: core::ptr::drop_in_placetokio::runtime::runtime::Runtime at /home/jonathan/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:523:1 7: reqwest::blocking::wait::enter at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/reqwest-0.12.15/src/blocking/wait.rs:80:21 8: reqwest::blocking::wait::timeout at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/reqwest-0.12.15/src/blocking/wait.rs:13:5 9: reqwest::blocking::client::ClientHandle::new at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/reqwest-0.12.15/src/blocking/client.rs:1246:15 10: reqwest::blocking::client::ClientBuilder::build at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/reqwest-0.12.15/src/blocking/client.rs:114:9 11: <opentelemetry_zipkin::exporter::ZipkinExporterBuilder as core::default::Default>::default at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/opentelemetry-zipkin-0.28.0/src/exporter/mod.rs:51:17 12: opentelemetry_zipkin::exporter::ZipkinExporter::builder at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/opentelemetry-zipkin-0.28.0/src/exporter/mod.rs:25:9 13: cinema_authentication_sidecar::observability::traces::init_tracing at ./src/observability/traces.rs:95:24 14: cinema_authentication_sidecar::main::{{closure}} at ./src/main.rs:106:30 15: <tokio::task::local::RunUntil<T> as core::future::future::Future>::poll::{{closure}} at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/local.rs:1041:42 16: tokio::task::local::LocalSet::with::{{closure}} at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/local.rs:793:13 17: std::thread::local::LocalKey<T>::try_with at /home/jonathan/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:308:12 18: std::thread::local::LocalKey<T>::with at /home/jonathan/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:272:9 19: tokio::task::local::LocalSet::with at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/local.rs:791:9 20: <tokio::task::local::RunUntil<T> as core::future::future::Future>::poll at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/local.rs:1031:9 21: tokio::task::local::LocalSet::run_until::{{closure}} at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/local.rs:689:19 22: <core::pin::Pin<P> as core::future::future::Future>::poll at /home/jonathan/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/future.rs:124:9 23: <core::pin::Pin<P> as core::future::future::Future>::poll at /home/jonathan/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/future.rs:124:9 24: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}} at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/current_thread/mod.rs:733:54 25: tokio::task::coop::with_budget at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/coop/mod.rs:167:5 26: tokio::task::coop::budget at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/coop/mod.rs:133:5 27: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}} at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/current_thread/mod.rs:733:25 28: tokio::runtime::scheduler::current_thread::Context::enter at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/current_thread/mod.rs:432:19 29: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}} at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/current_thread/mod.rs:732:36 30: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}} at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/current_thread/mod.rs:820:68 31: tokio::runtime::context::scoped::Scoped<T>::set at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context/scoped.rs:40:9 32: tokio::runtime::context::set_scheduler::{{closure}} at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context.rs:180:26 33: std::thread::local::LocalKey<T>::try_with at /home/jonathan/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:308:12 34: std::thread::local::LocalKey<T>::with at /home/jonathan/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:272:9 35: tokio::runtime::context::set_scheduler at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context.rs:180:9 36: tokio::runtime::scheduler::current_thread::CoreGuard::enter at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/current_thread/mod.rs:820:27 37: tokio::runtime::scheduler::current_thread::CoreGuard::block_on at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/current_thread/mod.rs:720:19 38: tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}} at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/current_thread/mod.rs:200:28 39: tokio::runtime::context::runtime::enter_runtime at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context/runtime.rs:65:16 40: tokio::runtime::scheduler::current_thread::CurrentThread::block_on at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/current_thread/mod.rs:188:9 41: tokio::runtime::runtime::Runtime::block_on_inner at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/runtime.rs:368:47 42: tokio::runtime::runtime::Runtime::block_on at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/runtime.rs:340:13 43: tokio::task::local::LocalSet::block_on at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/local.rs:646:9 44: actix_rt::runtime::Runtime::block_on at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/actix-rt-2.10.0/src/runtime.rs:138:9 45: actix_rt::system::SystemRunner::block_on at /home/jonathan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/actix-rt-2.10.0/src/system.rs:244:9 46: cinema_authentication_sidecar::main at ./src/main.rs:33:1 47: core::ops::function::FnOnce::call_once at /home/jonathan/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.

jpoisso avatar Mar 18 '25 18:03 jpoisso

@cijothomas having the very same issue on 0.30 and I can't seem to find a working combination

tried simple/batch with blocking/async

reneklacan avatar Jun 18 '25 21:06 reneklacan

@reneklacan Can you share a minimal repro app for us to easily investigate? (The documentation status of Zipkin exporter is still same, not very good. But once given a minimal repro, I can try to find what's wrong and fix it/document it)

cijothomas avatar Jun 18 '25 22:06 cijothomas

@cijothomas I actually didn't use zipkin (just otlp + sdk with axum), will try to create minimal repro app tomorrow or share what eventually worked for me

reneklacan avatar Jun 18 '25 23:06 reneklacan

I ended up switching to grpc/tonic which pretty much just worked in my case.

reneklacan avatar Jun 19 '25 11:06 reneklacan

I ended up switching to grpc/tonic which pretty much just worked in my case.

If you happen to have the repro for the non-working http/otlp, please do share it. We have integration tests for OTLP to catch issues, but we are missing some cases there, and want to enhance it based on issues reported by users.

cijothomas avatar Jun 19 '25 16:06 cijothomas

Still not working with either blocking or async Request clients. 0.27 was the last working version w.r.t to Zipkin.

When using async, the first call goes through and then the whole server hangs forever. When using blocking, the app fails to start due to the panic caused by 'blocking is not allowed'.

opentelemetry = { version = "0.30.0", features = ["trace"], default-features = false}
opentelemetry_sdk = { version = "0.30.0", features = ["rt-tokio", "tokio", "trace", "logs"], default-features = false}
opentelemetry-zipkin = { version = "0.30.0", features = ["reqwest-client"], default-features = false }
//opentelemetry-zipkin = { version = "0.30.0", features = ["reqwest-blocking-client"], default-features = false }
tracing-opentelemetry = {version = "0.31.0", features = ["tracing-log", "async-trait"], default-features = false }
global::set_text_map_propagator(opentelemetry_zipkin::Propagator::new());

let exporter = ZipkinExporter::builder()
    .with_collector_endpoint(url)
    .with_http_client(http_client)
    .build()
    .map_err(|e| {
        AppError::ServerError(format!("failed to create Zipkin exporter: {e:?}"))
    })?;

let tracer_provider = SdkTracerProvider::builder()
    .with_simple_exporter(exporter)
    .with_resource(
        Resource::builder_empty()
            .with_service_name(app_name.clone())
            .build(),
    )
    .build();

global::set_tracer_provider(tracer_provider.clone());

let tracer = tracer_provider.tracer(app_name);

let zipkin_layer = tracing_opentelemetry::layer()
    .with_location(true)
    .with_error_records_to_exceptions(true)
    .with_tracer(tracer);

let subscriber = Registry::default()
    .with(console_layer)
    .with(zipkin_layer)
    .with(filter_layer);

tracing::subscriber::set_global_default(subscriber).map_err(|e| {
    AppError::ServerError(format!(
        "failed to setup Zipkin telemetry tracing subscriber: {e:?}."
    ))
})?;

Here's my last working checkpoint on 0.27:

global::set_text_map_propagator(opentelemetry_zipkin::Propagator::new());
let zipkin_pipeline = opentelemetry_zipkin::new_pipeline()
    .with_service_name(app_name)
    .with_collector_endpoint(url)
    .with_trace_config(Config::default())
    .with_http_client(http_client)
    .install_batch(opentelemetry_sdk::runtime::Tokio)
    .map_err(|e| {
        AppError::ServerError(format!("failed to setup zipkin telemetry tracer: {e:?}."))
    })?;

let zipkin_layer = tracing_opentelemetry::layer()
    .with_location(true)
    .with_error_records_to_exceptions(true)
    .with_tracer(zipkin_pipeline);
let subscriber = Registry::default()
    .with(console_layer)
    .with(zipkin_layer)
    .with(filter_layer);

tracing::subscriber::set_global_default(subscriber).map_err(|e| {
    AppError::ServerError(format!(
        "failed to setup Zipkin telemetry tracing subscriber: {e:?}."
    ))
})?;

jpoisso avatar Aug 21 '25 14:08 jpoisso