[Bug] Panic and crashloop
Bug report
One of our indexers suddenly started crashing with the following. We're not sure why.
Relevant log output
thread 'tokio-runtime-worker' panicked at 'failed to parse mappings: Bad magic number (at offset 0)', chain/ethereum/src/capabilities.rs:62:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Feb 22 16:50:34.867 INFO Data source count at start: 2, sgd: 42628, subgraph_id: QmQ4pbd8UFcipKr5z3cYukp6kCj6pYRDYmQv2JYrcLQDBo, component: SubgraphInstanceManager
Panic in tokio task, aborting!
IPFS hash
No response
Subgraph name or link to explorer
No response
Some information to help us out
- [ ] Tick this box if this bug is caused by a regression found in the latest release.
- [ ] Tick this box if this bug is specific to the hosted service.
- [X] I have searched the issue tracker to make sure this issue is not a duplicate.
OS information
None
Added RUST_BACKTRACE=1 and now I see the following (the logs are actually messier and I tried to clean them up)
tokio-runtime-worker' panicked at 'failed to parse mappings: Bad magic number (at offset 0), ', sgdchain/ethereum/src/capabilities.rs: :4262862, backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: core::result::unwrap_failed
3: graph_core::subgraph::instance_manager::SubgraphInstanceManager<S>::build_subgraph_runner::{{closure}}
4: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
5: tokio::runtime::task::core::Feb 22 16:57:38.576Core< TDEBG, S>tokio::runtime::task::raw::poll
7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
8: tokio::runtime::scheduler::multi_thread::worker::run
9: tokio::runtime::task::raw::poll
10: tokio::runtime::task::UnownedTask<S>::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Panic in tokio task, aborting!
hey @paymog is this specific to this subgraph QmQ4pbd8UFcipKr5z3cYukp6kCj6pYRDYmQv2JYrcLQDBo or do you see this on multiple deployments?
turns out this happened because an unbuilt subgraph was deployed into our infra. We mitigated by removing the invalid subgraph. It would be good if graph node could just kill the particular instance manager thread instead of the whole process.
I can't quite remember which ipfs hash was causing the issue - it may have been that one or a different one.
What does 'unbuilt subgraph' mean here? Does that mean AssemblyScript was deployed instead of WASM blobs?
Yup! The subgraph was uploaded without first running graph build so it was assembly script .ts files) and not wasm
Wild. Is there maybe a bug in graph-cli that deployed assembly script sources? Also, I don't recognize the sgdchain/ethereum/src/capabilities.rs file name in the graph-node sources. Is this crashloop happening in vanilla graph-node?
The subgraph wasn't uploaded using the graph cli, it was uploaded using a customer CLI tool. Whoops, I probably didn't clean up the logs perfect, I think the sgd prefix on the path is incorrect. I think the right path is chain/ethereum/src/capabilities.rs. Yup, the crash loop is happening in vanilla v0.34.0