Panic when connecting to newly initialized database
Description
I'm trying to make a simple program to wrap tiup playground --mode tikv-slim that starts the database while initializing it with values, which is useful for testing my other code. I found that if you connect to the database too early, it will panic with the message:
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/xendergo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tikv-client-0.2.0/src/pd/cluster.rs:243:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
The problem can be reproduced like this:
#[tokio::main]
async fn main() {
Command::new("tiup")
.args(["playground", "--mode", "tikv-slim"])
.spawn()
.expect("Spawning to work");
let db = loop {
if let Ok(v) = tikv_client::TransactionClient::new(vec!["127.0.0.1:2379"]).await {
break v;
}
};
}
Fixes?
The code causing the panic could be fixed with proper error handling:
...
let previous_leader = previous.leader.as_ref().ok_or_else(|| Error::Whatever)?;
...
Although I have no idea how the code in question works and the problem could be caused by a more fundamental issue.
Workarounds
Some workarounds could be to replace the error checking loop with a tokio::time::sleep call long enough for the database to get itself going:
#[tokio::main]
async fn main() {
std::process::Command::new("tiup")
.args(["playground", "--mode", "tikv-slim"])
.spawn()
.expect("Spawning to work");
tokio::time::sleep(std::time::Duration::from_secs(10)).await;
let db = tikv_client::TransactionClient::new(vec!["127.0.0.1:2379"])
.await
.expect("Connecting to the database to work");
}
or to use futures::FutureExt::catch_unwind:
use futures::FutureExt;
#[tokio::main]
async fn main() {
std::process::Command::new("tiup")
.args(["playground", "--mode", "tikv-slim"])
.spawn()
.expect("Spawning to work");
let db = loop {
if let Ok(Ok(v)) = std::panic::AssertUnwindSafe(tikv_client::TransactionClient::new(vec![
"127.0.0.1:2379",
]))
.catch_unwind()
.await
{
break v;
}
};
println!("Done");
}
We are running into the same issue as well. Looks like PD responds with its members but in certain situations it doesn't populate the leader field and panics. The panics cause startup failures in our service if our service tries to bootstrap before the TiKV stack (PD + TiKV) is in a consistent state. Can we return an error here instead of panicking as @Xendergo suggested? This way we can have specify potential retry logic from the caller's end.
Let me try to fix it.