ethermint icon indicating copy to clipboard operation
ethermint copied to clipboard

imp(rpc) Add support for EVM RPC metrics

Open v-homsi opened this issue 2 years ago • 8 comments

Closes: #XXX

Description

This PR exposes EVM RPC metrics similar to the ones exposed by go-ethereum.

The metrics correspond to:

  1. Request duration
  2. Count of successful calls
  3. Count of failed calls
  4. The RPC request calls

This PR creates a separate metrics server to serve the metrics for RPC and will be used to expose other EVM/ application metrics in the future.

Based on this discussion, three approaches are identified to exposing the metrics:

  1. Enable the metrics in go-ethereum by passing a hardcoded flag in the CLI --metrics. This option will expose metrics that are not relevant to Ethermint and does not allow for a config file option.
  2. Intercept the request after or before it's served

Sample metrics:

# TYPE rpc_duration_all_count counter
rpc_duration_all_count 19954

# TYPE rpc_duration_all summary
rpc_duration_all {quantile="0.5"} 1.1972705e+06
rpc_duration_all {quantile="0.75"} 8.7099585e+06
rpc_duration_all {quantile="0.95"} 8.960733589999996e+07
rpc_duration_all {quantile="0.99"} 1.0390894278e+08
rpc_duration_all {quantile="0.999"} 1.14853960718e+08
rpc_duration_all {quantile="0.9999"} 1.1498575e+08

# TYPE rpc_duration_eth_blockNumber_success_count counter
rpc_duration_eth_blockNumber_success_count 2

# TYPE rpc_duration_eth_blockNumber_success summary
rpc_duration_eth_blockNumber_success {quantile="0.5"} 507
rpc_duration_eth_blockNumber_success {quantile="0.75"} 681
rpc_duration_eth_blockNumber_success {quantile="0.95"} 681
rpc_duration_eth_blockNumber_success {quantile="0.99"} 681
rpc_duration_eth_blockNumber_success {quantile="0.999"} 681
rpc_duration_eth_blockNumber_success {quantile="0.9999"} 681

# TYPE rpc_duration_eth_chainId_success_count counter
rpc_duration_eth_chainId_success_count 34

# TYPE rpc_duration_eth_chainId_success summary
rpc_duration_eth_chainId_success {quantile="0.5"} 1133.5
rpc_duration_eth_chainId_success {quantile="0.75"} 2806.75
rpc_duration_eth_chainId_success {quantile="0.95"} 6268.25
rpc_duration_eth_chainId_success {quantile="0.99"} 6272
rpc_duration_eth_chainId_success {quantile="0.999"} 6272
rpc_duration_eth_chainId_success {quantile="0.9999"} 6272

# TYPE rpc_duration_eth_estimateGas_success_count counter
rpc_duration_eth_estimateGas_success_count 2

# TYPE rpc_duration_eth_estimateGas_success summary
rpc_duration_eth_estimateGas_success {quantile="0.5"} 9705.5
rpc_duration_eth_estimateGas_success {quantile="0.75"} 9715
rpc_duration_eth_estimateGas_success {quantile="0.95"} 9715
rpc_duration_eth_estimateGas_success {quantile="0.99"} 9715
rpc_duration_eth_estimateGas_success {quantile="0.999"} 9715
rpc_duration_eth_estimateGas_success {quantile="0.9999"} 9715

# TYPE rpc_duration_eth_gasPrice_success_count counter
rpc_duration_eth_gasPrice_success_count 3

# TYPE rpc_duration_eth_gasPrice_success summary
rpc_duration_eth_gasPrice_success {quantile="0.5"} 1236
rpc_duration_eth_gasPrice_success {quantile="0.75"} 1362
rpc_duration_eth_gasPrice_success {quantile="0.95"} 1362
rpc_duration_eth_gasPrice_success {quantile="0.99"} 1362
rpc_duration_eth_gasPrice_success {quantile="0.999"} 1362
rpc_duration_eth_gasPrice_success {quantile="0.9999"} 1362

# TYPE rpc_duration_eth_getBalance_success_count counter
rpc_duration_eth_getBalance_success_count 0

# TYPE rpc_duration_eth_getBalance_success summary
rpc_duration_eth_getBalance_success {quantile="0.5"} 0
rpc_duration_eth_getBalance_success {quantile="0.75"} 0
rpc_duration_eth_getBalance_success {quantile="0.95"} 0
rpc_duration_eth_getBalance_success {quantile="0.99"} 0
rpc_duration_eth_getBalance_success {quantile="0.999"} 0
rpc_duration_eth_getBalance_success {quantile="0.9999"} 0

# TYPE rpc_duration_eth_getBlockByNumber_success_count counter
rpc_duration_eth_getBlockByNumber_success_count 3

# TYPE rpc_duration_eth_getBlockByNumber_success summary
rpc_duration_eth_getBlockByNumber_success {quantile="0.5"} 1623
rpc_duration_eth_getBlockByNumber_success {quantile="0.75"} 1784
rpc_duration_eth_getBlockByNumber_success {quantile="0.95"} 1784
rpc_duration_eth_getBlockByNumber_success {quantile="0.99"} 1784
rpc_duration_eth_getBlockByNumber_success {quantile="0.999"} 1784
rpc_duration_eth_getBlockByNumber_success {quantile="0.9999"} 1784

# TYPE rpc_duration_eth_getTransactionByHash_success_count counter
rpc_duration_eth_getTransactionByHash_success_count 99

# TYPE rpc_duration_eth_getTransactionByHash_success summary
rpc_duration_eth_getTransactionByHash_success {quantile="0.5"} 2151
rpc_duration_eth_getTransactionByHash_success {quantile="0.75"} 3531
rpc_duration_eth_getTransactionByHash_success {quantile="0.95"} 5698
rpc_duration_eth_getTransactionByHash_success {quantile="0.99"} 6231
rpc_duration_eth_getTransactionByHash_success {quantile="0.999"} 6231
rpc_duration_eth_getTransactionByHash_success {quantile="0.9999"} 6231

# TYPE rpc_duration_eth_getTransactionCount_success_count counter
rpc_duration_eth_getTransactionCount_success_count 1136

# TYPE rpc_duration_eth_getTransactionCount_success summary
rpc_duration_eth_getTransactionCount_success {quantile="0.5"} 4439
rpc_duration_eth_getTransactionCount_success {quantile="0.75"} 6791.25
rpc_duration_eth_getTransactionCount_success {quantile="0.95"} 10646.649999999996
rpc_duration_eth_getTransactionCount_success {quantile="0.99"} 12842.739999999998
rpc_duration_eth_getTransactionCount_success {quantile="0.999"} 14244.807999999999
rpc_duration_eth_getTransactionCount_success {quantile="0.9999"} 14252

# TYPE rpc_duration_eth_getTransactionReceipt_success_count counter
rpc_duration_eth_getTransactionReceipt_success_count 854

# TYPE rpc_duration_eth_getTransactionReceipt_success summary
rpc_duration_eth_getTransactionReceipt_success {quantile="0.5"} 95
rpc_duration_eth_getTransactionReceipt_success {quantile="0.75"} 434.25
rpc_duration_eth_getTransactionReceipt_success {quantile="0.95"} 18385.749999999978
rpc_duration_eth_getTransactionReceipt_success {quantile="0.99"} 36295.9
rpc_duration_eth_getTransactionReceipt_success {quantile="0.999"} 42889
rpc_duration_eth_getTransactionReceipt_success {quantile="0.9999"} 42889

# TYPE rpc_duration_eth_sendRawTransaction_success_count counter
rpc_duration_eth_sendRawTransaction_success_count 5

# TYPE rpc_duration_eth_sendRawTransaction_success summary
rpc_duration_eth_sendRawTransaction_success {quantile="0.5"} 1169
rpc_duration_eth_sendRawTransaction_success {quantile="0.75"} 1505.5
rpc_duration_eth_sendRawTransaction_success {quantile="0.95"} 1556
rpc_duration_eth_sendRawTransaction_success {quantile="0.99"} 1556
rpc_duration_eth_sendRawTransaction_success {quantile="0.999"} 1556
rpc_duration_eth_sendRawTransaction_success {quantile="0.9999"} 1556

# TYPE rpc_failure gauge
rpc_failure 0

# TYPE rpc_requests gauge
rpc_requests 19954

# TYPE rpc_success gauge
rpc_success 19954


For contributor use:

  • [ ] Targeted PR against correct branch (see CONTRIBUTING.md)
  • [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • [ ] Code follows the module structure standards.
  • [ ] Wrote unit and integration tests
  • [ ] Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • [ ] Added relevant godoc comments.
  • [ ] Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • [ ] Re-reviewed Files changed in the Github PR explorer

For admin use:

  • [ ] Added appropriate labels to PR (ex. WIP, R4R, docs, etc)
  • [ ] Reviewers assigned
  • [ ] Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

v-homsi avatar Aug 26 '22 03:08 v-homsi

@facs95 @fedekunze @danburck

After digging deep into go-ethereum code and investigating the different alternatives, here are the different approaches to support metrics on the EVM rpc:

Approach 1:

go-ethereum already has support for metrics including setting up a separate metrics server and exposing metrics. To use that, ethermint can call:

  • Call Exp.Setup(address string) from here
  • pass --metrics in the CLI (hardcoded) as per this

Pros: Easy to integrate in our code and will shift most of the heavy lifting to go-ethereum

Cons: All the metrics from go-ethereum will be exposed. Some of which are not relevant to ethermint such as consensus related metrics Need to add --metrics to our start command and I don't think ethermint could support a config file option (app.toml) for this

Approach 2:

Create an http middleware that gets called after the request has been processed by go-ethereum which needs to:

  • Parse the json message from the http request and expose it as a metric

Pros: Approach is independent of go-ethereum

Cons: Approach results in parsing the message twice as go-ethereum parses it the first time and ethermint will need to parse again Need to copy a lot of the code that parses the http request from go-ethereum to ethermint repo The middleware would be unable to know if the request was successful or not. It could only know how long it took to execute the request and what method was called

Approach 3:

Copy the implementation of ServeHTTP from go-ethereum to ethermint which includes the metrics that are only relevant for the RPC specified here

Pros: Complete independence of go-ethereum in RPC Only relevant metrics are collected and exposed Cleanest approach that produces the most optimal results (no unnecessary metrics and can track if request failed or succeeded)

Cons: Need to create a lot of tests to make sure functionality is maintained Harder to stay synced with go-ethereum upstream Bigger chunk of code to maintain

What do you think?

v-homsi avatar Sep 05 '22 16:09 v-homsi

Codecov Report

Merging #1303 (3491e5a) into main (07202c1) will increase coverage by 0.86%. The diff coverage is 62.27%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1303      +/-   ##
==========================================
+ Coverage   54.78%   55.65%   +0.86%     
==========================================
  Files         108      124      +16     
  Lines       10015    11342    +1327     
==========================================
+ Hits         5487     6312     +825     
- Misses       4255     4705     +450     
- Partials      273      325      +52     
Impacted Files Coverage Δ
rpc/client/metrics.go 0.00% <0.00%> (ø)
rpc/client/stdio.go 0.00% <0.00%> (ø)
rpc/types/callback.go 0.00% <0.00%> (ø)
rpc/types/decimal.go 0.00% <0.00%> (ø)
rpc/types/errors.go 0.00% <0.00%> (ø)
rpc/types/block.go 31.92% <8.82%> (-18.54%) :arrow_down:
rpc/client/client_opt.go 20.58% <20.58%> (ø)
rpc/types/subscription.go 26.47% <26.47%> (ø)
rpc/client/ipc.go 44.44% <44.44%> (ø)
server/config/config.go 22.85% <50.00%> (+0.63%) :arrow_up:
... and 9 more

codecov[bot] avatar Sep 19 '22 03:09 codecov[bot]

Need to add --metrics to our start command and I don't think ethermint could support a config file option (app.toml) for this

@v-homsi Why can't we add a config option for this? It seems to me that this approach would be the most efficient long-term, especially if we can simply drop irrelevant metrics.

0a1c avatar Sep 19 '22 16:09 0a1c

Need to add --metrics to our start command and I don't think ethermint could support a config file option (app.toml) for this

@v-homsi Why can't we add a config option for this? It seems to me that this approach would be the most efficient long-term, especially if we can simply drop irrelevant metrics.

go-ethereum expects the --metrics to be passed in the command line which is hardcoded in init method in here. if we wanted to support a config file option for it, we would need to parse the config and somehow find a way to inject --metrics into the command line before the init method of go-ethereum is called.

Another drawback of this approach is that we would be exposing metrics that are not relevant and would never change (at the mercy of what go-ethereum exposes).

v-homsi avatar Sep 19 '22 16:09 v-homsi

@v-homsi thanks! Let's add the new files into client/, server/, and the other existing folders. We also need to add a new API breaking change entry in the Changelog.

We should address the linter warnings and gosec errors as well

The folder I migrated the files to ( rpc ) already exists and I think it might be better to separate by feature rather than functionality (All RPC stuff goes under rpc folder). What do you think?

v-homsi avatar Sep 29 '22 04:09 v-homsi

Converting this PR to draft PR again. I came across https://github.com/ledgerwatch/erigon and I would like to investigate potentially replacing go-ethereum with erigon instead of migrating the code as erigon has different metrics mechanism (the original motivation of the PR). Let me know if you have any thoughts on this. Feel free to keep reviewing the PR as I'm waiting for feedback.

v-homsi avatar Oct 07 '22 00:10 v-homsi

Need to add --metrics to our start command and I don't think ethermint could support a config file option (app.toml) for this

@v-homsi Why can't we add a config option for this? It seems to me that this approach would be the most efficient long-term, especially if we can simply drop irrelevant metrics.

go-ethereum expects the --metrics to be passed in the command line which is hardcoded in init method in here. if we wanted to support a config file option for it, we would need to parse the config and somehow find a way to inject --metrics into the command line before the init method of go-ethereum is called.

Another drawback of this approach is that we would be exposing metrics that are not relevant and would never change (at the mercy of what go-ethereum exposes).

the Enable is a public variable, is it possible that we set it explicitly based on the config value?

yihuang avatar Oct 11 '22 15:10 yihuang

Need to add --metrics to our start command and I don't think ethermint could support a config file option (app.toml) for this

@v-homsi Why can't we add a config option for this? It seems to me that this approach would be the most efficient long-term, especially if we can simply drop irrelevant metrics.

go-ethereum expects the --metrics to be passed in the command line which is hardcoded in init method in here. if we wanted to support a config file option for it, we would need to parse the config and somehow find a way to inject --metrics into the command line before the init method of go-ethereum is called. Another drawback of this approach is that we would be exposing metrics that are not relevant and would never change (at the mercy of what go-ethereum exposes).

the Enable is a public variable, is it possible that we set it explicitly based on the config value?

Thanks for taking a look!

The problem with this is that there's no guarantee that setting Enabled to true be executed before some of the code in go-ethereum that relies on it. In case of the RPC metrics, these global metrics will use the NilGauge instead of StandardGauge because the code in here is dependent on the Enabled variable. This results in the metric to always be the default value (0, in the case of RPC).

Interestingly, this metric works fine because it gets registered after the Enabled flag is set and it's not a global variable.

I created a PR in my fork with the change you mentioned. Feel free to try it out yourself.

v-homsi avatar Oct 11 '22 23:10 v-homsi

After discussions with @facs95 regarding this PR, he suggested to go with approach 1 and try to upstream changes to go-ethereum suitable for our use cases instead of migrating the entire RPC code base.

I created this PR for it

v-homsi avatar Oct 13 '22 03:10 v-homsi

Replaced by - https://github.com/evmos/ethermint/pull/1378

facs95 avatar Oct 14 '22 21:10 facs95