minitrace-rust icon indicating copy to clipboard operation
minitrace-rust copied to clipboard

Bug: Missing BE silently compile and run.

Open taqtiqa-mark opened this issue 2 years ago • 13 comments

These integration tests are landing in PR #123. Currently a missing block expression silently compiles and runs. ~~yields different pass/fail behavior in the async/sync cases.~~

cargo test spans -- --nocapture &>log.txt

The documentation is silent (issue #125 ) so it is not clear what is to be expected. ~~considered broken behavior.~~

~~I'm guessing the async failure shown here is "wrong", but the silent omission of spans that have been configured also seems wrong, so I'm not sure, maybe this case span output should be considered correct, and the expected value is wrong....~~

use minitrace::trace;
use test_utilities::*;

// With no block expression the span "test-span" is silently omitted.
// Reference:
// - https://github.com/tikv/minitrace-rust/issues/125
#[trace("a-span")]
async fn f(a: u32) -> u32 {
    a
}

#[tokio::main]
async fn main() {
    let (root, collector) = minitrace::Span::root("root");
    //{
    let _child_span = root.set_local_parent()
    f(1).await;
    //}
    drop(root);
    let records: Vec<minitrace::collector::SpanRecord> = futures::executor::block_on(collector.collect());

    let expected = r#"[
    SpanRecord {
        id: 1,
        parent_id: 0,
        begin_unix_time_ns: \d+,
        duration_ns: \d+,
        event: "root",
        properties: [],
    },
]"#;
    let actual = normalize_spans(records);
    assert_eq_text!(expected, &actual);
}

~~generates this failure:~~

[0mtest [0m[1mtests/spans/no-be-local-async.rs[0m ... [0m[1m[31merror
[0m[31mTest case failed at runtime.
[0m
[0m[1m[31mSTDERR:
[0m[31m┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
[0m[31mLeft:
[0m[31m[
[0m[31m    SpanRecord {
[0m[31m        id: 1,
[0m[31m        parent_id: 0,
[0m[31m        begin_unix_time_ns: \d+,
[0m[31m        duration_ns: \d+,
[0m[31m        event: "root",
[0m[31m        properties: [],
[0m[31m    },
[0m[31m]
[0m[31m
[0m[31mRight:
[0m[31m[
[0m[31m    SpanRecord {
[0m[31m        id: 1,
[0m[31m        parent_id: 0,
[0m[31m        begin_unix_time_ns: \d+,
[0m[31m        duration_ns: \d+,
[0m[31m        event: "root",
[0m[31m        properties: [],
[0m[31m    },
[0m[31m    SpanRecord {
[0m[31m        id: 2,
[0m[31m        parent_id: 1,
[0m[31m        begin_unix_time_ns: \d+,
[0m[31m        duration_ns: \d+,
[0m[31m        event: "a-span",
[0m[31m        properties: [],
[0m[31m    },
[0m[31m]
[0m[31m
[0m[31mDiff:
[0m[31m[
[0m[31m    SpanRecord {
[0m[31m        id: 1,
[0m[31m        parent_id: 0,
[0m[31m        begin_unix_time_ns: \d+,
[0m[31m        duration_ns: \d+,
[0m[31m        event: "root",
[0m[31m        properties: [],
[0m[31m    },
[0m[31m[42m    SpanRecord {
[0m[31m        id: 2,
[0m[31m        parent_id: 1,
[0m[31m        begin_unix_time_ns: \d+,
[0m[31m        duration_ns: \d+,
[0m[31m        event: "a-span",
[0m[31m        properties: [],
[0m[31m    },
[0m[31m[0m]
[0m[31m
[0m[31m
[0m[31mthread 'main' panicked at 'text differs', /home/hedge/src/minitrace-rust/minitrace-macro/tests/spans/no-be-local-async.rs:33:5
[0m[31mnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[0m[31m┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
[0m

~~However, this sync version passes:~~

use minitrace::trace;
use test_utilities::*;

// With no block expression the span "a-span" is silently omitted.
// Reference:
// - https://github.com/tikv/minitrace-rust/issues/125
#[trace("a-span")]
fn f(a: u32) -> u32 {
    a
}

fn main() {
    let (root, collector) = minitrace::Span::root("root");
    //{
    let _sg1 = root.set_local_parent();
    f(1);
    //}
    drop(root);
    let records: Vec<minitrace::collector::SpanRecord> =
        futures::executor::block_on(collector.collect());
    let expected = r#"[
    SpanRecord {
        id: 1,
        parent_id: 0,
        begin_unix_time_ns: \d+,
        duration_ns: \d+,
        event: "root",
        properties: [],
    },
]"#;
    let actual = normalize_spans(records);
    assert_eq_text!(expected, &actual);
}

~~Giving:~~

test [0m[1mtests/spans/no-be-local-sync.rs[0m ... [0m[32mok

~~I should also note the threads "scope" tests omit the second SpanRecord which is currently considered 'correct', and they are green:~~

[0mtest [0m[1mtests/spans/no-be-threads-async.rs[0m ... [0m[32mok
[0mtest [0m[1mtests/spans/no-be-threads-sync.rs[0m ... [0m[32mok
```~~

Hopefully I haven't overlooked anything obvious.

taqtiqa-mark avatar Mar 22 '22 17:03 taqtiqa-mark

@zhongzc Seems that the guard of set_local_parent has come across the await point.... I think that might be a bug.

andylokandy avatar Mar 23 '22 15:03 andylokandy

Thanks for taking the time to look at this @andylokandy. I'd like to understand some of the code a little better - is there a unit test that covers this behavior that I can try come to grips with?

I won't be able to fix this - in case that wasn't obvious ;) - but I would like to understand.

taqtiqa-mark avatar Mar 23 '22 19:03 taqtiqa-mark

Not a bug.

To make things easier, we decided to let async functions with #[trace("a-span")] construct thread-safe Span while sync functions with #[trace("a-span")] construct LocalSpan.

Only if users provide arg enter_on_poll = true, async function construct LocalSpan.

If you make a diff:

-#[trace("a-span")]
+#[trace("a-span", enter_on_poll = true)]
async fn f(a: u32) -> u32 {

I think the test result will be as your expected.

zhongzc avatar Mar 24 '22 07:03 zhongzc

Ok, I start to realize that it's not a bug...

In fact, there should not be a difference between sync and async because the async code example you've presented should not compile as I intended... Because set_local_parent is not supposed to be used in an async fn, but since it complied, I've been wondering if it's a bug.

Now I know why: tokio::main doesn't enforce Future: Send. If you put the async example into tokio::Runtime::spawn(), which does enforce, the example should successfully be rejected by the compiler.

Then a new question popped up: how to reject set_local_parent in an async fn that is not enforced to be Send?

andylokandy avatar Mar 24 '22 09:03 andylokandy

@taqtiqa-mark Good catch! It's an awesome discovery.

andylokandy avatar Mar 24 '22 09:03 andylokandy

OK, I'm a little confused. Are block expressions required by Minitrace or not?

taqtiqa-mark avatar Mar 24 '22 12:03 taqtiqa-mark

Not a bug.

@zhongzc apologies for the unfocused report. I'm struggling to work out what is expected behavior and what is unexpected, and the 1different from three others suggested it might be a bug in the one case.

From what you've said I have two questions:

  1. why is this not a bug with the three other cases: None of them produce the second span "a-span"?
  2. If they are correctly not producing the second span, shouldn't there be some indication/warning error that the function is running but not being traced?

taqtiqa-mark avatar Mar 24 '22 12:03 taqtiqa-mark

OK, I'm a little confused. Are block expressions required by Minitrace or not?

Yes and no. Dropping the guard returned from set_local_parent() before calling collector.collect() is required. And the block expression is one of the ways to achieve that.

why is this not a bug with the three other cases: None of them produce the second span "a-span"?

they are correctly not producing the span because the guard returned from set_local_parent() is not dropped.

If they are correctly not producing the second span, shouldn't there be some indication/warning error that the function is running but not being traced?

Good question. I'd like to do so but I haven't come up with a solution.

andylokandy avatar Mar 24 '22 14:03 andylokandy

Good question. I'd like to do so but I haven't come up with a solution.

Ack these things are hard - as a great man once said "There are no solutions. Only trade offs".

Here, printing a warning to std out when there are no traces other than "root" is a workaround that can land quite quickly. It has the advantage returning a helpful "Did you omit a block expression or omit to drop the span before calling collect()".

This (sharp) edge case is one many new users will encounter?
In terms of performance impact - well it is a workaround until a fix lands?

taqtiqa-mark avatar Mar 24 '22 22:03 taqtiqa-mark

printing a warning to std out when there are no traces other than "root"

This seems like the only solution... But I'm not sure whether someone does want a single root span... @zhongzc What do you think?

andylokandy avatar Mar 25 '22 10:03 andylokandy

printing a warning to std out when there are no traces other than "root"

This seems like the only solution... But I'm not sure whether someone does want a single root span... @zhongzc What do you think?

We need to improve the API design. The entire experience from @taqtiqa-mark helps us a lot.

set_local_parent is bad. Calling it properly requires users to figure out the inner implementation.

There is a draft design from @breeswish. I think it's a good starting point for us to discuss a new better API design.

zhongzc avatar Mar 25 '22 10:03 zhongzc

Thanks for the link. I like how @breeswish has delineated the scenarios, and I think that covers the scope.

I think it'd be useful to iterate the design via actual code - write the code you wish you had. This could be done in a branch where PR can land and be iterated on?

taqtiqa-mark avatar Mar 26 '22 09:03 taqtiqa-mark

This seems like the only solution... But I'm not sure whether someone does want a single root span... @zhongzc What do you think?

I'm still in the macro code, so this is conjecture.... does the local span really exist floating free of the span root? I expected drop(root) would cause all child spans to be dropped?

Is there a performance impact to having drop(root) drop child spans?

Is there a use case the child can outlive the root span? If so can you give a minimal example - it'd be useful to add to the integration tests.

taqtiqa-mark avatar Mar 26 '22 09:03 taqtiqa-mark

Closing this issue since the internal structure has been entirely refactored and it can not be reprocueded.

andylokandy avatar Jul 18 '23 14:07 andylokandy