dhat-rs icon indicating copy to clipboard operation
dhat-rs copied to clipboard

Assertion error at startup

Open banool opened this issue 3 years ago • 8 comments

Repro

Get repo, checkout appropriate branch:

git clone [email protected]:aptos-labs/aptos-core.git
git switch banool/dhat_test

Run this command:

cargo run --release -p aptos -- node run-local-testnet --force-restart --config-path /tmp/yes_api.yaml --assume-yes

Output: https://gist.github.com/banool/a3aa22bfcf9154c15bc2073306d4fe9c.

This error does not occur with the default allocator / jemalloc.

Details:

  • Rust 1.64
  • MacOS 12.6
  • CPU arch: ARM (M1 mac)

banool avatar Oct 24 '22 18:10 banool

Sometimes this error also manifests as a segfault (Bus Error: 10 or Segmentation fault: 11).

banool avatar Oct 24 '22 18:10 banool

Other times it just hangs on startup, without the process ever coming fully alive (in this case, the API never starts listening).

banool avatar Oct 24 '22 18:10 banool

Hi, thanks for the report. Can you try the new 0.3.2 release and see if that fixes it? It has a bugfix for a problem that sounds similar to yours.

nnethercote avatar Oct 31 '22 06:10 nnethercote

Hey, new errors now:

fatal runtime error: assertion failed: thread_info.is_none()
Abort trap: 6

The program I'm using this with uses tokio btw.

banool avatar Nov 02 '22 03:11 banool

Get the same error on Rust 1.65 on x86 Mac

afajl avatar Nov 11 '22 13:11 afajl

This is an odd one: I've seen it my M1 Mac as well and the source of the message is almost certainly this.

That implies that something is trying to update information associated with a thread, but the thread already has associated information, so it asserts and fails.

I can't reproduce it reliably. I've only seen it happen a handful of times in the last couple of months. AFAICT, that code only runs during thread creation, so I'm not sure why it's being triggered.

I don't think this helps much, but maybe it points a way to investigate what is happening.

garypen avatar Dec 09 '22 14:12 garypen

Can repro on 0.3.2 on an M1, macOS 13.1, rust 1.66.0. It triggers for me when I run cargo test, and then about 4 out of 10 times.

That it's reproducible on x86 probably rules out my pet theory, which would be that it's related to time resolution on apple silicon.

antifuchs avatar Dec 23 '22 20:12 antifuchs

Running tokio after dhat should fix this problem:

fn main() {
    //#[cfg(feature = "dhat-heap")]
    let _profiler = dhat::Profiler::new_heap();

    main2();
}

#[tokio::main]
async fn main2() {
    println!("Hello, world!");
}

Unfortunately I can't figure out where the rabbit hole is 😞

Boshen avatar Jan 06 '23 13:01 Boshen