charon icon indicating copy to clipboard operation
charon copied to clipboard

Lighthouse with beaconmock sometimes fails to download proposer duties

Open gsora opened this issue 2 years ago • 2 comments

🐞 Bug Report

Description

Running Lighthouse with beaconmock exhibit connection problems, often related to connection resets and (presumed) early EOF.

🔬 Minimal Reproduction

Create a compose cluster, and watch any of the Lighthouse instance logs with:

docker-compose logs -f vc0-lighthouse | grep download

🔥 Error


ERRO Failed to download proposer duties      err: Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("node0")), port: Some(3600), path: "/eth/v1/validator/duties/proposer/1817407", query: None, fragment: None }, source: hyper::Error(Io, Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }) })), service: duties

🌍 Your Environment

Operating System:

  
Ubuntu Linux 22.10
  

What version of Charon are you running? (Which release)

  
v0.13.0
  

Anything else relevant (validator index / public key)?

gsora avatar Jan 31 '23 13:01 gsora

Tried a slightly different version of the MVP HTTP stresser I wrote in #860, modified to unmarshal data returned from beaconmock:

use reqwest::{ClientBuilder, StatusCode};
use serde::Deserialize;
use serde::Serialize;
use std::time::Duration;

#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Root {
    #[serde(rename = "dependent_root")]
    pub dependent_root: String,
    pub data: Vec<Daum>,
    #[serde(rename = "execution_optimistic")]
    pub execution_optimistic: bool,
}

#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Daum {
    pub pubkey: String,
    pub slot: String,
    #[serde(rename = "validator_index")]
    pub validator_index: String,
}

#[derive(Debug)]
enum Error {
    Reqwest(reqwest::Error),
    MissingURL,
}

impl From<reqwest::Error> for Error {
    fn from(e: reqwest::Error) -> Self {
        Self::Reqwest(e)
    }
}

#[tokio::main]
async fn main() -> Result<(), Error> {
    // Some simple CLI args requirements...
    let url = match std::env::args().nth(1) {
        Some(url) => url,
        None => return Err(Error::MissingURL),
    };

    eprintln!("Fetching {:?}...", url);

    let client = ClientBuilder::new()
        .tcp_keepalive(Duration::from_secs(1))
        .build()
        .unwrap();

    let mut idx: u64 = 0;
    loop {
        let res = client.post(url.clone()).send().await?;

        if res.status() != StatusCode::OK {
            println!("status code not OK!: {}", res.status())
        }

        let j: Root = res.json().await?;

        _ = j;

        idx += 1;
        println!("Done call #{}, data: {}", idx, j.dependent_root);
    }
}

I never managed to reproduce the error this way.

Disabling HTTP keep-alive seems to alleviate the issue inside the Docker compose cluster though.

gsora avatar Jan 31 '23 13:01 gsora

Story Points: 5 See the discussion in Ongoing Sizing

Powered by Parabol

thomasheremans avatar Feb 22 '23 13:02 thomasheremans