zenoh
zenoh copied to clipboard
Core dump when running the 2nd instance of zenohd on the same subnet
ubuntu@ecs-zenoh-yhe-01:~/zenoh/target/release$ ./zenohd --version The zenoh router v0.5.0-beta.5-224-g87cf763-modified built with rustc 1.51.0-nightly (2987785df 2020-12-28)
Start the first zenohd instance with RUST_LOG=debug flag, no issue: [2021-02-19T15:14:33Z DEBUG zenoh_router::routing::pubsub] Register subscription /@/router/3C24E4AEED654EC48C521E191957EB19/plugin/storages/backend/* for face 0 [2021-02-19T15:14:33Z DEBUG zenoh_router::routing::pubsub] Register router subscription /@/router/3C24E4AEED654EC48C521E191957EB19/plugin/storages/backend/* (router: 3C24E4AEED654EC48C521E191957EB19) [2021-02-19T15:14:33Z DEBUG zenoh_router::routing::pubsub] Register peer subscription /@/router/3C24E4AEED654EC48C521E191957EB19/plugin/storages/backend/* (peer: 3C24E4AEED654EC48C521E191957EB19) [2021-02-19T15:14:33Z DEBUG zenoh_router::routing::router] New face 2 [2021-02-19T15:14:33Z INFO tide::server] Server listening on http://0.0.0.0:8000
Then start the second zenohd instance with RUST_LOG=debug flag on different machine on the same subnet, core dump:
ubuntu@ecs-zenoh-yhe-01:~/zenoh/target/release$ RUST_LOG=debug ./zenohd
[2021-02-19T15:15:00Z DEBUG zenohd] zenohd v0.5.0-beta.5-224-g87cf763-modified built with rustc 1.51.0-nightly (2987785df 2020-12-28)
[2021-02-19T15:15:00Z DEBUG zenoh_router::plugins] Plugins to load: []
[2021-02-19T15:15:00Z DEBUG zenoh_util::lib_loader] Search for libraries libzplugin_*.so to load in ["/usr/local/lib", "/usr/lib", "/home/ubuntu/.zenoh/lib", "/home/ubuntu/zenoh/target/release", "/home/ubuntu/zenoh/target/release"]
[2021-02-19T15:15:00Z DEBUG zenoh_util::lib_loader] Do not load plugin storages from "/home/ubuntu/zenoh/target/release/libzplugin_storages.so" : already loaded.
[2021-02-19T15:15:00Z DEBUG zenoh_util::lib_loader] Do not load plugin rest from "/home/ubuntu/zenoh/target/release/libzplugin_rest.so" : already loaded.
[2021-02-19T15:15:00Z DEBUG zenoh_util::lib_loader] Do not load plugin storages from "/home/ubuntu/zenoh/target/release/libzplugin_storages.so" : already loaded.
[2021-02-19T15:15:00Z DEBUG zenoh_router::plugins] Plugin storages loaded from /usr/lib/libzplugin_storages.so
[2021-02-19T15:15:00Z DEBUG zenoh_router::plugins] Plugin rest loaded from /home/ubuntu/zenoh/target/release/libzplugin_rest.so
[2021-02-19T15:15:00Z DEBUG zenohd] Config: {"multicast_scouting": "true", "peer": "", "listener": "tcp/0.0.0.0:7447", "mode": "router", "add_timestamp": "true"}
[2021-02-19T15:15:00Z INFO zenoh_router::runtime] Using PID: E354454D56274CC4B6EB457750C2D651
[2021-02-19T15:15:00Z DEBUG zenoh_router::routing::network] [Routers network] Add node (self) E354454D56274CC4B6EB457750C2D651
[2021-02-19T15:15:00Z DEBUG zenoh_router::routing::network] [Peers network] Add node (self) E354454D56274CC4B6EB457750C2D651
[2021-02-19T15:15:00Z DEBUG zenoh_router::runtime::orchestrator] Listener tcp/0.0.0.0:7447 added
[2021-02-19T15:15:00Z INFO zenoh_router::runtime::orchestrator] zenohd can be reached on tcp/10.1.101.216:7447
[2021-02-19T15:15:00Z INFO zenoh_router::runtime::orchestrator] zenohd can be reached on tcp/172.17.0.1:7447
[2021-02-19T15:15:00Z DEBUG zenoh_router::runtime::orchestrator] UDP port bound to 224.0.0.224:7447
[2021-02-19T15:15:00Z DEBUG zenoh_router::runtime::orchestrator] Joined multicast group 224.0.0.224
[2021-02-19T15:15:00Z INFO zenoh_router::runtime::orchestrator] zenohd listening scout messages on 224.0.0.224:7447
[2021-02-19T15:15:00Z DEBUG zenoh_router::runtime::orchestrator] UDP port bound to 10.1.101.216:44347
[2021-02-19T15:15:00Z DEBUG zenoh_router::plugins] Start plugin storages
[2021-02-19T15:15:00Z DEBUG zenoh_router::runtime::orchestrator] Waiting for UDP datagram...
[2021-02-19T15:15:00Z DEBUG zenoh_router::plugins] Start plugin rest
thread 'async-std/runtime' panicked at 'range end index 94126439026992 out of range for slice of length 16', zenoh-protocol/src/core/mod.rs:184:10
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Aborted (core dumped)
ubuntu@ecs-zenoh-yhe-01:~/zenoh/target/release$
Don't know why github strick out those lines...those are core dump traces
With latest build (02/19/2021, after cargo clean, cargo update, cargo build --release), 2nd zenohd instance still core dumped:
ubuntu@ecs-zenoh-yhe-01:~/eclipse-zenoh/zenoh/target/release$ git fetch origin master From https://github.com/eclipse-zenoh/zenoh
- branch master -> FETCH_HEAD
ubuntu@ecs-zenoh-yhe-01:~/eclipse-zenoh/zenoh/target/release$ RUST_LOG=debug ./zenohd
[2021-02-19T16:31:25Z DEBUG zenohd] zenohd v0.5.0-beta.5-232-gc9974c9-modified built with rustc 1.51.0-nightly (2987785df 2020-12-28)
[2021-02-19T16:31:25Z DEBUG zenoh_router::plugins] Plugins to load: []
[2021-02-19T16:31:25Z DEBUG zenoh_util::lib_loader] Search for libraries libzplugin_*.so to load in ["/usr/local/lib", "/usr/lib", "/home/ubuntu/.zenoh/lib", "/home/ubuntu/eclipse-zenoh/zenoh/target/release", "/home/ubuntu/eclipse-zenoh/zenoh/target/release"]
[2021-02-19T16:31:25Z DEBUG zenoh_util::lib_loader] Do not load plugin storages from "/home/ubuntu/eclipse-zenoh/zenoh/target/release/libzplugin_storages.so" : already loaded.
[2021-02-19T16:31:25Z DEBUG zenoh_util::lib_loader] Do not load plugin rest from "/home/ubuntu/eclipse-zenoh/zenoh/target/release/libzplugin_rest.so" : already loaded.
[2021-02-19T16:31:25Z DEBUG zenoh_util::lib_loader] Do not load plugin storages from "/home/ubuntu/eclipse-zenoh/zenoh/target/release/libzplugin_storages.so" : already loaded.
[2021-02-19T16:31:25Z DEBUG zenoh_router::plugins] Plugin storages loaded from /usr/lib/libzplugin_storages.so
[2021-02-19T16:31:25Z DEBUG zenoh_router::plugins] Plugin rest loaded from /home/ubuntu/eclipse-zenoh/zenoh/target/release/libzplugin_rest.so
[2021-02-19T16:31:25Z DEBUG zenohd] Config: {"listener": "tcp/0.0.0.0:7447", "peer": "", "add_timestamp": "true", "multicast_scouting": "true", "mode": "router"}
[2021-02-19T16:31:25Z INFO zenoh_router::runtime] Using PID: EED898FDAAEF42BC9191286F6C97743C
[2021-02-19T16:31:25Z DEBUG zenoh_router::routing::network] [Routers network] Add node (self) EED898FDAAEF42BC9191286F6C97743C
[2021-02-19T16:31:25Z DEBUG zenoh_router::routing::network] [Peers network] Add node (self) EED898FDAAEF42BC9191286F6C97743C
[2021-02-19T16:31:25Z DEBUG zenoh_router::runtime::orchestrator] Listener tcp/0.0.0.0:7447 added
[2021-02-19T16:31:25Z INFO zenoh_router::runtime::orchestrator] zenohd can be reached on tcp/10.1.101.216:7447
[2021-02-19T16:31:25Z INFO zenoh_router::runtime::orchestrator] zenohd can be reached on tcp/172.17.0.1:7447
[2021-02-19T16:31:25Z DEBUG zenoh_router::runtime::orchestrator] UDP port bound to 224.0.0.224:7447
[2021-02-19T16:31:25Z DEBUG zenoh_router::runtime::orchestrator] Joined multicast group 224.0.0.224
[2021-02-19T16:31:25Z INFO zenoh_router::runtime::orchestrator] zenohd listening scout messages on 224.0.0.224:7447
[2021-02-19T16:31:25Z DEBUG zenoh_router::runtime::orchestrator] UDP port bound to 10.1.101.216:56489
[2021-02-19T16:31:25Z DEBUG zenoh_router::plugins] Start plugin storages
[2021-02-19T16:31:25Z DEBUG zenoh_router::runtime::orchestrator] Waiting for UDP datagram...
[2021-02-19T16:31:25Z DEBUG zenoh_router::plugins] Start plugin rest
thread 'async-std/runtime' panicked at 'range end index 139961064262600 out of range for slice of length 16', zenoh-protocol/src/core/mod.rs:184:10
note: run with
RUST_BACKTRACE=1environment variable to display a backtrace Aborted (core dumped)
@heyong4725 do you still have this problem with the latest version on master?
@Mallets I haven't get a chance to try it. I will try once I have the environment setup
This is probably due to incompatible storages and/or rest plugins. Maybe @JEnoch or @gabrik can provide more infos as the first investigated those problems and the second faced them.
Hi @heyong4725, I had indeed a very similar issue, the cause was an old plugin causing the crash.
I see this line in your log Plugin storages loaded from /usr/lib/libzplugin_storages.so maybe that plugin is an old one and causes the crash.
Can you try to remove that file and restart zenoh?
Yes, indeed. This is due to an old plugin. After I remove the /usr/lib/libzplugin_storages.so, there is no more core-dump.
Is there anyway to avoid core-dump?
"Is there anyway to avoid core-dump?", what I mean here is if software can detect this kind of error, with fail-safe/fail-operational capability, gracefully give a warning and continue... without core dump.
@heyong4725 : the root of the problem is that Rust doesn't have a stable ABI, and probably won't before a while. The implication is that there is no guarantee that a Rust type compiled in zenohd has the same memory layout when compiled in a plugin/backend. And if they don't, zenohd might exchange incompatible data with the plugins/backend, leading to unpredictable behaviour, including core dump.
I don't think it's feasible at runtime to detect and recover such incompatible memory representation of types.
We rather tried to ensure that types compiled in both zenohd and plugins/backends have the same memory representation, by:
- forcing the Rust toolchain to be the same when building zenohd and the plugins/backends (see
rust-toolchainfiles). But that was not enough, probably because some dependencies used by both by be at different versions. - ensure that zenoh and plugins/backends don't use different versions of a dependency (see the committed
Cargo.lockfiles that list the dependencies to be used). And that seems to work so far...
But the result is that the plugins/backends must have the exact same version than zenohd (we'll make sure for each release to have same toolchain and dependencies for all). Still, I just saw this comment that make me think that might not be enough:
ABI and even layout can change between any two compiler invocations even if they are 100% identical
We probably need to investigate in a more sustainable solution. I had a glance to abi_stable but it seems to bring lot of contraints, including a re-definition of the std types (RString, RVec, RSlice...).
@JEnoch , thanks for detailed analysis. I like your thinking on a more sustainable solution.
This kind of ABI interface type incompatible problem must be a common issue, I am wondering if zenoh needs to use some kind of intermediate representation for this, similar to message passing.
In WebAssembly eco-system, especially the WASI subgroup, there is an effort on this called "Interface Types", below are a few links for you to evaluate / investigate:)
https://bytecodealliance.org/articles/1-year-update https://github.com/WebAssembly/WASI/blob/main/docs/witx.md https://www.youtube.com/watch?v=LCA9NnH7DxE
When zenoh load shareable plugins (i.e. backend libraries, future zenoh flow operators), I think there might be a need for signature/authentication etc. I wonder if the https://crates.io/crates/minisign can be used for this purpose. It is still related to this issue that make sure alignments of all these components and avoid runtime core-dumps