ethermint
ethermint copied to clipboard
imp(rpc) Add support for EVM RPC metrics
Closes: #XXX
Description
This PR exposes EVM RPC metrics similar to the ones exposed by go-ethereum
.
The metrics correspond to:
- Request duration
- Count of successful calls
- Count of failed calls
- The RPC request calls
This PR creates a separate metrics server to serve the metrics for RPC and will be used to expose other EVM/ application metrics in the future.
Based on this discussion, three approaches are identified to exposing the metrics:
- Enable the metrics in
go-ethereum
by passing a hardcoded flag in the CLI--metrics
. This option will expose metrics that are not relevant to Ethermint and does not allow for a config file option. - Intercept the request after or before it's served
Sample metrics:
# TYPE rpc_duration_all_count counter
rpc_duration_all_count 19954
# TYPE rpc_duration_all summary
rpc_duration_all {quantile="0.5"} 1.1972705e+06
rpc_duration_all {quantile="0.75"} 8.7099585e+06
rpc_duration_all {quantile="0.95"} 8.960733589999996e+07
rpc_duration_all {quantile="0.99"} 1.0390894278e+08
rpc_duration_all {quantile="0.999"} 1.14853960718e+08
rpc_duration_all {quantile="0.9999"} 1.1498575e+08
# TYPE rpc_duration_eth_blockNumber_success_count counter
rpc_duration_eth_blockNumber_success_count 2
# TYPE rpc_duration_eth_blockNumber_success summary
rpc_duration_eth_blockNumber_success {quantile="0.5"} 507
rpc_duration_eth_blockNumber_success {quantile="0.75"} 681
rpc_duration_eth_blockNumber_success {quantile="0.95"} 681
rpc_duration_eth_blockNumber_success {quantile="0.99"} 681
rpc_duration_eth_blockNumber_success {quantile="0.999"} 681
rpc_duration_eth_blockNumber_success {quantile="0.9999"} 681
# TYPE rpc_duration_eth_chainId_success_count counter
rpc_duration_eth_chainId_success_count 34
# TYPE rpc_duration_eth_chainId_success summary
rpc_duration_eth_chainId_success {quantile="0.5"} 1133.5
rpc_duration_eth_chainId_success {quantile="0.75"} 2806.75
rpc_duration_eth_chainId_success {quantile="0.95"} 6268.25
rpc_duration_eth_chainId_success {quantile="0.99"} 6272
rpc_duration_eth_chainId_success {quantile="0.999"} 6272
rpc_duration_eth_chainId_success {quantile="0.9999"} 6272
# TYPE rpc_duration_eth_estimateGas_success_count counter
rpc_duration_eth_estimateGas_success_count 2
# TYPE rpc_duration_eth_estimateGas_success summary
rpc_duration_eth_estimateGas_success {quantile="0.5"} 9705.5
rpc_duration_eth_estimateGas_success {quantile="0.75"} 9715
rpc_duration_eth_estimateGas_success {quantile="0.95"} 9715
rpc_duration_eth_estimateGas_success {quantile="0.99"} 9715
rpc_duration_eth_estimateGas_success {quantile="0.999"} 9715
rpc_duration_eth_estimateGas_success {quantile="0.9999"} 9715
# TYPE rpc_duration_eth_gasPrice_success_count counter
rpc_duration_eth_gasPrice_success_count 3
# TYPE rpc_duration_eth_gasPrice_success summary
rpc_duration_eth_gasPrice_success {quantile="0.5"} 1236
rpc_duration_eth_gasPrice_success {quantile="0.75"} 1362
rpc_duration_eth_gasPrice_success {quantile="0.95"} 1362
rpc_duration_eth_gasPrice_success {quantile="0.99"} 1362
rpc_duration_eth_gasPrice_success {quantile="0.999"} 1362
rpc_duration_eth_gasPrice_success {quantile="0.9999"} 1362
# TYPE rpc_duration_eth_getBalance_success_count counter
rpc_duration_eth_getBalance_success_count 0
# TYPE rpc_duration_eth_getBalance_success summary
rpc_duration_eth_getBalance_success {quantile="0.5"} 0
rpc_duration_eth_getBalance_success {quantile="0.75"} 0
rpc_duration_eth_getBalance_success {quantile="0.95"} 0
rpc_duration_eth_getBalance_success {quantile="0.99"} 0
rpc_duration_eth_getBalance_success {quantile="0.999"} 0
rpc_duration_eth_getBalance_success {quantile="0.9999"} 0
# TYPE rpc_duration_eth_getBlockByNumber_success_count counter
rpc_duration_eth_getBlockByNumber_success_count 3
# TYPE rpc_duration_eth_getBlockByNumber_success summary
rpc_duration_eth_getBlockByNumber_success {quantile="0.5"} 1623
rpc_duration_eth_getBlockByNumber_success {quantile="0.75"} 1784
rpc_duration_eth_getBlockByNumber_success {quantile="0.95"} 1784
rpc_duration_eth_getBlockByNumber_success {quantile="0.99"} 1784
rpc_duration_eth_getBlockByNumber_success {quantile="0.999"} 1784
rpc_duration_eth_getBlockByNumber_success {quantile="0.9999"} 1784
# TYPE rpc_duration_eth_getTransactionByHash_success_count counter
rpc_duration_eth_getTransactionByHash_success_count 99
# TYPE rpc_duration_eth_getTransactionByHash_success summary
rpc_duration_eth_getTransactionByHash_success {quantile="0.5"} 2151
rpc_duration_eth_getTransactionByHash_success {quantile="0.75"} 3531
rpc_duration_eth_getTransactionByHash_success {quantile="0.95"} 5698
rpc_duration_eth_getTransactionByHash_success {quantile="0.99"} 6231
rpc_duration_eth_getTransactionByHash_success {quantile="0.999"} 6231
rpc_duration_eth_getTransactionByHash_success {quantile="0.9999"} 6231
# TYPE rpc_duration_eth_getTransactionCount_success_count counter
rpc_duration_eth_getTransactionCount_success_count 1136
# TYPE rpc_duration_eth_getTransactionCount_success summary
rpc_duration_eth_getTransactionCount_success {quantile="0.5"} 4439
rpc_duration_eth_getTransactionCount_success {quantile="0.75"} 6791.25
rpc_duration_eth_getTransactionCount_success {quantile="0.95"} 10646.649999999996
rpc_duration_eth_getTransactionCount_success {quantile="0.99"} 12842.739999999998
rpc_duration_eth_getTransactionCount_success {quantile="0.999"} 14244.807999999999
rpc_duration_eth_getTransactionCount_success {quantile="0.9999"} 14252
# TYPE rpc_duration_eth_getTransactionReceipt_success_count counter
rpc_duration_eth_getTransactionReceipt_success_count 854
# TYPE rpc_duration_eth_getTransactionReceipt_success summary
rpc_duration_eth_getTransactionReceipt_success {quantile="0.5"} 95
rpc_duration_eth_getTransactionReceipt_success {quantile="0.75"} 434.25
rpc_duration_eth_getTransactionReceipt_success {quantile="0.95"} 18385.749999999978
rpc_duration_eth_getTransactionReceipt_success {quantile="0.99"} 36295.9
rpc_duration_eth_getTransactionReceipt_success {quantile="0.999"} 42889
rpc_duration_eth_getTransactionReceipt_success {quantile="0.9999"} 42889
# TYPE rpc_duration_eth_sendRawTransaction_success_count counter
rpc_duration_eth_sendRawTransaction_success_count 5
# TYPE rpc_duration_eth_sendRawTransaction_success summary
rpc_duration_eth_sendRawTransaction_success {quantile="0.5"} 1169
rpc_duration_eth_sendRawTransaction_success {quantile="0.75"} 1505.5
rpc_duration_eth_sendRawTransaction_success {quantile="0.95"} 1556
rpc_duration_eth_sendRawTransaction_success {quantile="0.99"} 1556
rpc_duration_eth_sendRawTransaction_success {quantile="0.999"} 1556
rpc_duration_eth_sendRawTransaction_success {quantile="0.9999"} 1556
# TYPE rpc_failure gauge
rpc_failure 0
# TYPE rpc_requests gauge
rpc_requests 19954
# TYPE rpc_success gauge
rpc_success 19954
For contributor use:
- [ ] Targeted PR against correct branch (see CONTRIBUTING.md)
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the module structure standards.
- [ ] Wrote unit and integration tests
- [ ] Updated relevant documentation (
docs/
) or specification (x/<module>/spec/
) - [ ] Added relevant
godoc
comments. - [ ] Added a relevant changelog entry to the
Unreleased
section inCHANGELOG.md
- [ ] Re-reviewed
Files changed
in the Github PR explorer
For admin use:
- [ ] Added appropriate labels to PR (ex.
WIP
,R4R
,docs
, etc) - [ ] Reviewers assigned
- [ ] Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)
@facs95 @fedekunze @danburck
After digging deep into go-ethereum code and investigating the different alternatives, here are the different approaches to support metrics on the EVM rpc:
Approach 1:
go-ethereum already has support for metrics including setting up a separate metrics server and exposing metrics. To use that, ethermint can call:
Pros: Easy to integrate in our code and will shift most of the heavy lifting to go-ethereum
Cons:
All the metrics from go-ethereum will be exposed. Some of which are not relevant to ethermint such as consensus related metrics
Need to add --metrics
to our start
command and I don't think ethermint could support a config file option (app.toml) for this
Approach 2:
Create an http middleware that gets called after the request has been processed by go-ethereum which needs to:
- Parse the json message from the http request and expose it as a metric
Pros: Approach is independent of go-ethereum
Cons: Approach results in parsing the message twice as go-ethereum parses it the first time and ethermint will need to parse again Need to copy a lot of the code that parses the http request from go-ethereum to ethermint repo The middleware would be unable to know if the request was successful or not. It could only know how long it took to execute the request and what method was called
Approach 3:
Copy the implementation of ServeHTTP from go-ethereum to ethermint which includes the metrics that are only relevant for the RPC specified here
Pros: Complete independence of go-ethereum in RPC Only relevant metrics are collected and exposed Cleanest approach that produces the most optimal results (no unnecessary metrics and can track if request failed or succeeded)
Cons: Need to create a lot of tests to make sure functionality is maintained Harder to stay synced with go-ethereum upstream Bigger chunk of code to maintain
What do you think?
Codecov Report
Merging #1303 (3491e5a) into main (07202c1) will increase coverage by
0.86%
. The diff coverage is62.27%
.
Additional details and impacted files
@@ Coverage Diff @@
## main #1303 +/- ##
==========================================
+ Coverage 54.78% 55.65% +0.86%
==========================================
Files 108 124 +16
Lines 10015 11342 +1327
==========================================
+ Hits 5487 6312 +825
- Misses 4255 4705 +450
- Partials 273 325 +52
Impacted Files | Coverage Δ | |
---|---|---|
rpc/client/metrics.go | 0.00% <0.00%> (ø) |
|
rpc/client/stdio.go | 0.00% <0.00%> (ø) |
|
rpc/types/callback.go | 0.00% <0.00%> (ø) |
|
rpc/types/decimal.go | 0.00% <0.00%> (ø) |
|
rpc/types/errors.go | 0.00% <0.00%> (ø) |
|
rpc/types/block.go | 31.92% <8.82%> (-18.54%) |
:arrow_down: |
rpc/client/client_opt.go | 20.58% <20.58%> (ø) |
|
rpc/types/subscription.go | 26.47% <26.47%> (ø) |
|
rpc/client/ipc.go | 44.44% <44.44%> (ø) |
|
server/config/config.go | 22.85% <50.00%> (+0.63%) |
:arrow_up: |
... and 9 more |
Need to add --metrics to our start command and I don't think ethermint could support a config file option (app.toml) for this
@v-homsi Why can't we add a config option for this? It seems to me that this approach would be the most efficient long-term, especially if we can simply drop irrelevant metrics.
Need to add --metrics to our start command and I don't think ethermint could support a config file option (app.toml) for this
@v-homsi Why can't we add a config option for this? It seems to me that this approach would be the most efficient long-term, especially if we can simply drop irrelevant metrics.
go-ethereum expects the --metrics
to be passed in the command line which is hardcoded in init
method in here. if we wanted to support a config file option for it, we would need to parse the config and somehow find a way to inject --metrics
into the command line before the init
method of go-ethereum is called.
Another drawback of this approach is that we would be exposing metrics that are not relevant and would never change (at the mercy of what go-ethereum exposes).
@v-homsi thanks! Let's add the new files into
client/
,server/
, and the other existing folders. We also need to add a new API breaking change entry in the Changelog.We should address the linter warnings and gosec errors as well
The folder I migrated the files to ( rpc ) already exists and I think it might be better to separate by feature rather than functionality (All RPC stuff goes under rpc folder). What do you think?
Converting this PR to draft PR again. I came across https://github.com/ledgerwatch/erigon and I would like to investigate potentially replacing go-ethereum
with erigon
instead of migrating the code as erigon
has different metrics mechanism (the original motivation of the PR).
Let me know if you have any thoughts on this.
Feel free to keep reviewing the PR as I'm waiting for feedback.
Need to add --metrics to our start command and I don't think ethermint could support a config file option (app.toml) for this
@v-homsi Why can't we add a config option for this? It seems to me that this approach would be the most efficient long-term, especially if we can simply drop irrelevant metrics.
go-ethereum expects the
--metrics
to be passed in the command line which is hardcoded ininit
method in here. if we wanted to support a config file option for it, we would need to parse the config and somehow find a way to inject--metrics
into the command line before theinit
method of go-ethereum is called.Another drawback of this approach is that we would be exposing metrics that are not relevant and would never change (at the mercy of what go-ethereum exposes).
the Enable
is a public variable, is it possible that we set it explicitly based on the config value?
Need to add --metrics to our start command and I don't think ethermint could support a config file option (app.toml) for this
@v-homsi Why can't we add a config option for this? It seems to me that this approach would be the most efficient long-term, especially if we can simply drop irrelevant metrics.
go-ethereum expects the
--metrics
to be passed in the command line which is hardcoded ininit
method in here. if we wanted to support a config file option for it, we would need to parse the config and somehow find a way to inject--metrics
into the command line before theinit
method of go-ethereum is called. Another drawback of this approach is that we would be exposing metrics that are not relevant and would never change (at the mercy of what go-ethereum exposes).the
Enable
is a public variable, is it possible that we set it explicitly based on the config value?
Thanks for taking a look!
The problem with this is that there's no guarantee that setting Enabled
to true be executed before some of the code in go-ethereum
that relies on it. In case of the RPC metrics, these global metrics will use the NilGauge
instead of StandardGauge
because the code in here is dependent on the Enabled
variable. This results in the metric to always be the default value (0, in the case of RPC).
Interestingly, this metric works fine because it gets registered after the Enabled
flag is set and it's not a global variable.
I created a PR in my fork with the change you mentioned. Feel free to try it out yourself.
After discussions with @facs95 regarding this PR, he suggested to go with approach 1 and try to upstream changes to go-ethereum suitable for our use cases instead of migrating the entire RPC code base.
I created this PR for it
Replaced by - https://github.com/evmos/ethermint/pull/1378