tracing icon indicating copy to clipboard operation
tracing copied to clipboard

bug(tracing-subscriber): Stack overflow dropping deeply nested spans

Open kjvalencik opened this issue 4 years ago • 4 comments

Bug Report

tracing-subscriber with the registry feature enabled causes a stack overflow when recursively dropping deeply nested spans.

Version

├── tracing v0.1.22
│   ├── tracing-attributes v0.1.11 (proc-macro)
│   └── tracing-core v0.1.17
└── tracing-subscriber v0.2.15
    ├── tracing v0.1.22 (*)
    ├── tracing-core v0.1.17 (*)
    ├── tracing-log v0.1.1
    │   └── tracing-core v0.1.17 (*)
    └── tracing-serde v0.1.2
        └── tracing-core v0.1.17 (*)

Platform

macOS 11.1

Crates

tracing-subscriber with registry feature

Description

[package]
name = "tracing-test"
version = "0.1.0"
edition = "2018"

[dependencies]
env_logger= "*"
log = "*"
tracing = "*"
tracing-subscriber = "*"
use tracing::Span;

#[tracing::instrument()]
fn instrumented() {
    log::info!("Operation");
}

fn wrapper(span: Span) {
    let _ = span.enter();
    instrumented();
}

fn main() {
    tracing_subscriber::fmt::init();

    let depth = std::env::args()
        .skip(1)
        .next()
        .map(|arg| arg.parse::<usize>())
        .unwrap_or(Ok(10))
        .unwrap();

    eprintln!("Depth: {}", depth);

    let mut span = tracing::info_span!("root");

    for _ in 0..depth {
        span = tracing::info_span!(parent: &span, "wrapper");
        wrapper(span.clone());
    }

    eprintln!("Dropping spans");
}
RUST_LOG=info cargo run -- 1000 > /dev/null
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/tracing-test 1000`

Depth: 1000
Dropping spans
RUST_LOG=info cargo run -- 10000 > /dev/null
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/tracing-test 10000`
Depth: 10000
Dropping spans

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Abort trap: 6

Release requires more depth, presumably because of optimized stack usage, but still overflows.

RUST_LOG=info cargo run --release -- 100000 > /dev/null
    Finished release [optimized] target(s) in 0.03s
     Running `target/release/tracing-test 100000`
Depth: 100000
Dropping spans

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Abort trap: 6

I had this issue in recursive async code. Because of async and heap allocation, stack overflow was not an issue for the untraced code, but was with tracing enabled.

kjvalencik avatar Dec 22 '20 16:12 kjvalencik

Thanks for reporting this. Can you include a backtrace for this? I have a suspicion as to what's happening here, but it'd be good to get confirmation via a backtrace.

davidbarsky avatar Dec 22 '20 17:12 davidbarsky

top-backtrace.txt

It's a very large backtrace. I included the top 1000 lines. Let me know if the entire backtrace would be useful.

The EXC_BAD_ACCESS is because it's overflowing the stack guard. It will fail at a fairly arbitrary section of the code depending on the size of the spans and other stack usage. I can also try reproducing with a smaller stack size.

Edit: Here's a full backtrace with --release and my stack size set to 128.

backtrace.txt

kjvalencik avatar Dec 22 '20 17:12 kjvalencik

Still very much a problem I believe! I've submitted a reproducible test here: https://github.com/tokio-rs/tracing/pull/2199.

I can potentially work on a fix, but, a) no promises :), and, b) I would appreciate a pointer to where exactly I should add a "drop a tree of parents in a loop" code.

matklad avatar Jul 01 '22 16:07 matklad

Although, taking a step back here:

The reason why we were hitting stack overflows is because we had buggy code which created infinitedely deep span trees. The bug was similar to https://onesignal.com/blog/solving-memory-leaks-in-rust/.

So perhaps a fix for that problem would be for tracing too emit some kind of error event if it detects an unreasonably deep span tree?

matklad avatar Jul 01 '22 17:07 matklad