lighthouse Allow DURATION for monitoring_endpoint to be configurable

Description

For me beaconcha.in is returning a StatusCode(429), "Too many requests", in response to a Sending metrics request when I use --monitoring-endpoint.

Version

wink@3900x 22-08-26T16:39:54.991Z:~/prgs/rust/forks/lighthouse (stable)
$ git log -1 --format=short
commit 18c61a5e8be3e54226a86a69b96f8f4f7fd790e4 (HEAD -> stable, upstream/stable)
Author: Paul Hauner <[email protected]>

    v3.0.0 (#3464)
wink@3900x 22-08-26T16:40:18.885Z:~/prgs/rust/forks/lighthouse (stable)
$ rustc --version
rustc 1.52.1 (9bc8c42bb 2021-05-09)
wink@3900x 22-08-26T16:40:45.797Z:~/prgs/rust/forks/lighthouse (stable)

Present Behaviour

Aug 26 08:31:15 robert lighthouse[253789]: Aug 26 15:31:15.629 INFO Sending metrics to remote endpoint      endpoint: https://beaconcha.in/, service: monitoring_client
Aug 26 08:31:15 robert lighthouse[253789]: Aug 26 15:31:15.831 ERRO Failed to send metrics to remote endpoint, error: StatusCode(429), service: monitoring_client

The problem is that beaconcha.in has a rate limit of 30,000/mth on the free tier. But in monitoring_ap/src/lib.rs we see DURATION is 60 seconds:

/// Duration after which we collect and send metrics to remote endpoint.
pub const UPDATE_DURATION: u64 = 60;

And in fn auto_update we see this is the interval duration between send_metrics calls:

    /// Creates a task which periodically sends the provided process metrics
    /// to the configured remote endpoint.
    pub fn auto_update(self, executor: TaskExecutor, processes: Vec<ProcessType>) {
        let mut interval = interval_at(
            // Have some initial delay for the metrics to get initialized
            Instant::now() + Duration::from_secs(25),
            Duration::from_secs(UPDATE_DURATION),
        );

        info!(self.log, "Starting monitoring api"; "endpoint" => %self.monitoring_endpoint);

        let update_future = async move {
            loop {
                interval.tick().await;
                match self.send_metrics(&processes).await {
                    Ok(()) => {
                        debug!(self.log, "Metrics sent to remote server"; "endpoint" => %self.monitoring_endpoint);
                    }
                    Err(e) => {
                        error!(self.log, "Failed to send metrics to remote endpoint"; "error" => %e)
                    }
                }
            }
        };

        executor.spawn(update_future, "monitoring_api");
    }

This means there are 1,440 per day * 30 is about 43,200/mth which exceeds the 30,000. In my particular beaconcha.in says I'm at 134,000/mth and thus they are rightly returning StatusCode(429).

Expected Behaviour

An ERRO should not occur

Steps to resolve

I suggest the default be set at 120secs or higher but ideally this should be configurable. In addition the documentation should provide information on what metrics are actually being sent by providing a link to the relevant "standards".

I can look at the code and I see that there are 3 ProcessTypes; BeaconNode, System and Validator. I can guess that if I use monitoring-endpoint on a BN then the BeaconNode metrics are sent and for a VC the Validator metrics are sent, but under what conditions are System type sent?

Aug 26 '22 17:08 winksaville

@michaelsproul any thoughts on this issue?

Aug 30 '22 17:08 winksaville

Seems like it should be a straight-forward fix. We're pretty flat out on getting ready for the merge but can probably squeeze it in for v3.1.1. I'll have a go at it now.

Aug 31 '22 06:08 michaelsproul

Here you go @winksaville: https://github.com/sigp/lighthouse/pull/3530

I wonder whether we should still change the default, as I recall beaconcha.in meters monitoring traffic differently to standard API traffic. CC @Buttaa

Aug 31 '22 07:08 michaelsproul

Here you go @winksaville: #3530

I wonder whether we should still change the default, as I recall beaconcha.in meters monitoring traffic differently to standard API traffic. CC @Buttaa

I think we should keep the default at 1min but provide the possibility to override. We'll check on our end whether there are any issues that count the metrics traffic wrongly.

I'm not sure if you already utilize this but we added the possibility to batch all the metrics in one single request on our end: https://github.com/gobitfly/eth2-client-metrics This could reduce traffic and fix edge cases for when a single request does not arrive for some reason (for example system metrics does not arrive while beacon node metrics arrive).

Aug 31 '22 08:08 manuelsc

@manuelsc, @michaelsproul my opinion the default should be 2min, so the default rate is below the 30,000/mth, but since the --monitoring-endpoint-frequency it's not a big deal either way.

Aug 31 '22 15:08 winksaville

@winksaville The monitoring endpoint has a different API limit than the rest of beaconcha.in´s API and is not counted toward your normal limit. We'll review the limit on our end though since I'm not sure whether they are enough for unbatched metric reports. Ideally lighthouse adapts the batched metrics approach, alternatively we need to adapt the custom limit for the endpoint.

Aug 31 '22 15:08 manuelsc

@winksaville The monitoring endpoint has a different API limit than the rest of beaconcha.in´s API and is not counted toward your normal limit. We'll review the limit on our end though since I'm not sure whether they are enough for unbatched metric reports. Ideally lighthouse adapts the batched metrics approach, alternatively we need to adapt the custom limit for the endpoint.

Hmm, what is actually counted in the request limit on beaconcha.in isn't clear to me at all. All I know is that in the on this pageaccount the monthly total is 120,000+ requests/mth which is above 30,000, there is a link to upgrade plan and the response from the send-metrics beaconcha.in was a 429 "Too many requests".

So AFAIK the only API I'm using from beaconch.in is the POST /api/v1/client/metrics, which is issued when --monitoring-endpoint is enabled. What other information might geth or lighthouse be sending to them?

Note; I'm running three nodes each with geth, lighthouse bn and lighthouse vc. Two nodes are on mainnet, one has 28 enabled validators the other has 0 and is a "hot" backup. The third node is associated with another account and its on prater, And it has 2 validators is registering 72,000 requests/mth. A week ago, when I raised this issue, I had --monitoring-endpoint enabled on the 6 lighthouse instances (3 bn's and 3 vc's), but currently there are no monitoring-endpoints enabled.

Aug 31 '22 16:08 winksaville

I'm not sure if you already utilize this but we added the possibility to batch all the metrics in one single request on our end:

We do send the beacon, validator and system metrics in a single request.

Aug 31 '22 19:08 pawanjay176

@winksaville The monitoring endpoint has a different API limit than the rest of beaconcha.in´s API and is not counted toward your normal limit. We'll review the limit on our end though since I'm not sure whether they are enough for unbatched metric reports. Ideally lighthouse adapts the batched metrics approach, alternatively we need to adapt the custom limit for the endpoint.

Hmm, what is actually counted in the request limit on beaconcha.in isn't clear to me at all. All I know is that in the on this pageaccount the monthly total is 120,000+ requests/mth which is above 30,000, there is a link to upgrade plan and the response from the send-metrics beaconcha.in was a 429 "Too many requests".

So AFAIK the only API I'm using from beaconch.in is the POST /api/v1/client/metrics, which is issued when --monitoring-endpoint is enabled. What other information might geth or lighthouse be sending to them?

Note; I'm running three nodes each with geth, lighthouse bn and lighthouse vc. Two nodes are on mainnet, one has 28 enabled validators the other has 0 and is a "hot" backup. The third node is associated with another account and its on prater, And it has 2 validators is registering 72,000 requests/mth. A week ago, when I raised this issue, I had --monitoring-endpoint enabled on the 6 lighthouse instances (3 bn's and 3 vc's), but currently there are no monitoring-endpoints enabled.

I see, I don't think that this is a lighthouse issue then. In order to track two nodes or two machines you need a premium beaconcha.in subscription of Goldfish or higher. Otherwise those errors you are seeing are perfectly normal since you have reached the limit of your free tier. If you are having a premium subscription please reach out to us and open an issue at the beaconcha.in explorer repo.

I don't see an issue with lighthouse`s implementation then. I would suggest to leave the default at 1min.

Sep 01 '22 10:09 manuelsc

Awesome, I've changed the default back to 1min in the PR

Sep 01 '22 11:09 michaelsproul

I'm not sure if you already utilize this but we added the possibility to batch all the metrics in one single request on our end:

We do send the beacon, validator and system metrics in a single request.

I'd like to be to do this as it could save bandwidth and halve the request count!

I've looked at the code AFAICT reporting metrics from beacon-chain sends beacon and system metrics and reporting metrics from validator_client sends validator and system metrics.

I'm unable to find a single request that sends all three, could you point me to the code and how I can enable it?

Sep 01 '22 14:09 winksaville

lighthouse lighthouse copied to clipboard

Allow DURATION for monitoring_endpoint to be configurable

Description

Version

Present Behaviour

Expected Behaviour

Steps to resolve

lighthouse
lighthouse copied to clipboard