hermes icon indicating copy to clipboard operation
hermes copied to clipboard

Hermes corner-cases related to gas

Open adizere opened this issue 1 year ago • 1 comments

Three errors found while looking throught the logs from an operator.

Error 1: "Too much gas wanted"

The error looks like this:

send_tx_with_account_sequence_retry{id=osmosis-1}: broadcast_tx_sync: Response { code: Err(11), data: Data([]), log: Log("Too much gas wanted: 35000000, maximum is 25000000: out of gas"), hash: transaction::Hash(2E74921779770D2497B5720F6376BC1B75EB231AC66B60E834F84820218C3FBD) }: diagnostic: the gas requirement is higher than the configured maximum gas! please check the Hermes config.toml

Consequence

In this case, the tx never makes it into the mempoool. Relayer's wallet is not being drained. The relayer drop the transaction and the IBC packet is never relayed.

Explanation

The reason why this error occurs is because Hermes is submitting the tx with 35M gas_limit, but the network osmosis-1 disallows transactions that have over a certain amount of gas MaxGasWantedPerTx to be rejected during CheckTx. This seems an osmosis-specific requirements. The default limit is 25M gas.

It seems that Hermes uses 35M gas because it's using default_gas parameter, which is due to account sequence mismatch error. Concretely:

1842 Jul 26 00:03:38 Beastie hermes[1950306]: 2022-07-26T00:03:38.980489Z WARN ThreadId(79) send_tx_with_account_sequence_retry{id=osmosis-1}:estimate_gas: failed to simulate tx, falling back on default gas because the error is potentially recoverable: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 234849, got 234850: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "5311844"} }

This error is ignorable, it can occur when Hermes is submitting multiple transactions in a row. Hermes handles this error by falling back on the default_gas parameter. Then Hermes submit the tx, and the chain refuses to accept it in the mempool. Hermes does not retry to re-submit it later on.

Takeaways

This error is harmless but it allows Hermes to drop packets ("leaky pipeline").

For devs:

Attaching the relevant logs below so that we can reproduce and get a full understanding of the issue in the relayer pipeline.

Relevant logs
  1841 Jul 26 00:03:38 Beastie hermes[1950306]: 2022-07-26T00:03:38.978447Z  INFO ThreadId(161) packet_cmd{src_chain=comdex-1 src_port=transfer src_channel=channel-1 dst_chain=osmosis-1}:relay{odata=c2769fc2 ->Destination @1-3500623; len=1}: assembled batch of 2 message(s)
  1842 Jul 26 00:03:38 Beastie hermes[1950306]: 2022-07-26T00:03:38.980489Z  WARN ThreadId(79) send_tx_with_account_sequence_retry{id=osmosis-1}:estimate_gas: failed to simulate tx, falling back on default gas because the error is potentially recoverable: gRPC call failed with statu       s: status: Unknown, message: "account sequence mismatch, expected 234849, got 234850: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "5311844"} }
  1843 Jul 26 00:03:38 Beastie hermes[1950306]: 2022-07-26T00:03:38.982446Z ERROR ThreadId(79) send_tx_with_account_sequence_retry{id=osmosis-1}: broadcast_tx_sync: Response { code: Err(11), data: Data([]), log: Log("Too much gas wanted: 35000000, maximum is 25000000: out of gas"),        hash: transaction::Hash(8BB34D036B1F8D78414CC01E40B9D156A9D3A75867B1B8F0A715ADA83967125B) }: diagnostic: the gas requirement is higher than the configured maximum gas! please check the Hermes config.toml
  1844 Jul 26 00:03:38 Beastie hermes[1950306]: 2022-07-26T00:03:38.982473Z  INFO ThreadId(161) packet_cmd{src_chain=comdex-1 src_port=transfer src_channel=channel-1 dst_chain=osmosis-1}:relay{odata=c2769fc2 ->Destination @1-3500623; len=1}: [Async~>osmosis-1] response(s): 1; Err(11       ):8BB34D036B1F8D78414CC01E40B9D156A9D3A75867B1B8F0A715ADA83967125B
  1845 Jul 26 00:03:38 Beastie hermes[1950306]: 2022-07-26T00:03:38.982491Z  INFO ThreadId(161) packet_cmd{src_chain=comdex-1 src_port=transfer src_channel=channel-1 dst_chain=osmosis-1}:relay{odata=c2769fc2 ->Destination @1-3500623; len=1}: success

Todos:

  • [ ] Hermes should retry submitting this transaction.
  • [ ] Hermes should not report "success" when this error occurs, because the tx was not relayed successfully.

For operators:

  • [ ] Parametrize default_gas for network osmosis-1, so that Hermes uses that value (instead of max_gas) when failing to estimate tx fees.

If default_gas is parametrized to a sensible value, eg. 5M gas or so, then the above error would be fixed.


Error 2: "out of gas in location" during CheckTx

The error looks like this:

send_tx_with_account_sequence_retry{id=juno-1}: broadcast_tx_sync: Response { code: Err(11), data: Data([]), log: Log("out of gas in location: ReadFlat; gasWanted: 214791, gasUsed: 215586: out of gas"), hash: transaction::Hash(B0A1B7CEEBEF3A8465F1162325F17A88C9292FEC13A16F46678C404AC3A0A7E5) }: diagnostic: the gas requirement is higher than the configured maximum gas! please check the Hermes config.toml

Consequence

Identical to the first error.

Explanation

The explanation is similar to above. The difference is that it is not MaxGasWantedPerTx step, but another verification step as part of CheckTx, that is preventing this transaction from entering the mempool. On the Hermes side, the problem appears not because of the missing (or misconfigured) default_gas parameter, but a misconfigured gas_multiplier parameter. In this case, the tx gas estimate step was successful, and Hermes is consequently submitting the transaction with the estimated gas 214791, but this estimated gas is insufficient. Hermes needs to multiply the estimated gas by gas_multiplier, which is likely 1.0, leading to this problem.

Takeaways

Logs
148808 Jul 26 23:49:26 Beastie hermes[657836]: 2022-07-26T23:49:26.359284Z  INFO ThreadId(1152) packet_cmd{src_chain=axelar-dojo-1 src_port=transfer src_channel=channel-4 dst_chain=juno-1}:relay{odata=96222b42 ->Destination @1-3158643; len=1}: assembled batch of 2 message(s)
148809 Jul 26 23:49:26 Beastie hermes[657836]: 2022-07-26T23:49:26.364128Z ERROR ThreadId(82) send_tx_with_account_sequence_retry{id=juno-1}: broadcast_tx_sync: Response { code: Err(11), data: Data([]), log: Log("out of gas in location: ReadFlat; gasWanted: 214791, gasUsed: 215586: out of gas"), hash: transaction::Hash(B0A1B7CEEBEF3A8465F1162325F17A88C9292FEC13A16F46678C404AC3A0A7E5) }: diagnostic: the gas requirement is higher than the configured maximum gas! please check the Hermes config.toml
148810 Jul 26 23:49:26 Beastie hermes[657836]: 2022-07-26T23:49:26.364158Z  INFO ThreadId(1152) packet_cmd{src_chain=axelar-dojo-1 src_port=transfer src_channel=channel-4 dst_chain=juno-1}:relay{odata=96222b42 ->Destination @1-3158643; len=1}: [Async~>juno-1] response(s): 1; Err(11) :B0A1B7CEEBEF3A8465F1162325F17A88C9292FEC13A16F46678C404AC3A0A7E5
148811 Jul 26 23:49:26 Beastie hermes[657836]: 2022-07-26T23:49:26.364171Z  INFO ThreadId(1152) packet_cmd{src_chain=axelar-dojo-1 src_port=transfer src_channel=channel-4 dst_chain=juno-1}:relay{odata=96222b42 ->Destination @1-3158643; len=1}: success

For devs:

Todo: Not much we can do to change the Hermes logic here, adding a retry mechanism would not help. We can, however, help operators by making the relayer more intelligent.

  • [ ] Hermes should report a warning for SDK-based chains that gas_multiplier of value 1.0 is likely going to cause "out of gas" errors. This can be part of the the runtime startup routine (health check/semantic config validation).
  • [ ] Hermes should not report "success" when this error occurs, because the tx was not relayed successfully.

For operators:

  • [ ] Parametrize gas_multiplier for network juno-1 to be at least 1.1, so that Hermes adjusts the estimated gas prior to submitting a transaction.

Error 3: "out of gas in location" during DeliverTx

Similar to the second error, but in this case the transaction enters the mempool and hence it costs operators money.

The log looks as follows, affecting the client worker:

INFO ThreadId(202) refresh{client=07-tendermint-11 src_chain=cosmoshub-4 dst_chain=fetchhub-4}: [cosmoshub-4 -> fetchhub-4:07-tendermint-11] client requires refresh INFO ThreadId(89) wait_for_block_commits: waiting for commit of tx hashes(s) B103A66D288FC546D60AAFDD21228780C0332AF78D9E8D1D64BECEF3897B1D9E id=fetchhub-4

The ThreadId(89) never reports any error. Instead, Hermes keeps on retrying to refresh the client, and keeps on failing.

On Mintscan, we see the transaction failed https://www.mintscan.io/fetchai/txs/1E2B91BFCC0A370B022C0EC114D644C9F9C1402A3694ACFAAF6369245F6E9E08

out of gas in location: WritePerByte; gasWanted: 457223, gasUsed: 460610: out of gas

Consequence

Clients do not get refreshed, potentially leading to expired clients. Operators waste funds on unsuccessful transactions.

Explanation

I think what happens here is again Hermes is using the wrong gas_multiplier configuration, but the consequences are more serious (as described above): The transactions is not rejected early on (at CheckTx), but enters the blockchain and consumes gas.

Reproducing

We can reproduce the error locally in a simple way, without involving the client worker. Use the following patch.

Hermes patch to submit transactions with incorrect gas
diff --git a/relayer/src/chain/cosmos/gas.rs b/relayer/src/chain/cosmos/gas.rs
index 679ab8467..c0156529c 100644
--- a/relayer/src/chain/cosmos/gas.rs
+++ b/relayer/src/chain/cosmos/gas.rs
@@ -78,13 +78,16 @@ fn adjust_estimated_gas(
         return 0;
     };

-    // If the multiplier is 1, no need to perform the multiplication
-    if gas_multiplier == 1.0 {
-        return min(gas_amount, max_gas);
-    }
+    // // If the multiplier is 1, no need to perform the multiplication
+    // if gas_multiplier == 1.0 {
+    //     return min(gas_amount, max_gas);
+    // }
+
+    let omultiplier = 0.8;
+    tracing::warn!("overwrote the gas multiplier to be 0.8");

     // Multiply the gas estimate by the gas_multiplier option
-    let (_sign, digits) = mul_floor(gas_amount, gas_multiplier).to_u64_digits();
+    let (_sign, digits) = mul_floor(gas_amount, omultiplier).to_u64_digits();

Then:

$ hermes update client --host-chain ibc-1 --client 07-tendermint-0
SUCCESS [
    ChainError(
        "deliver_tx for B6D97298EF8E50E77EF3FFEC71AB3D8E3F3AD08930E9A834D43E56FBCA406B51 reports error: code=Err(11), log=Log(\"out of gas in location: WriteFlat; gasWanted: 70813, gasUsed: 71397: out of gas\")",
    ),
]

Takeaways

For devs:

In production the problem is related to the client worker. Like the production logs show, this worker never prints the error, and simply ignores it. I confirmed by looking at the code.

Full logs from production
Jul 26 09:47:25 Beastie hermes[1950306]: 2022-07-26T09:47:25.094791Z  INFO ThreadId(202) refresh{client=07-tendermint-11 src_chain=cosmoshub-4 dst_chain=fetchhub-4}: [cosmoshub-4 -> fetchhub-4:07-tendermint-11] client requires refresh
Jul 26 09:47:25 Beastie hermes[1950306]: 2022-07-26T09:47:25.198566Z  INFO ThreadId(89) wait_for_block_commits: waiting for commit of tx hashes(s) B103A66D288FC546D60AAFDD21228780C0332AF78D9E8D1D64BECEF3897B1D9E id=fetchhub-4
Jul 26 09:47:31 Beastie hermes[1950306]: 2022-07-26T09:47:31.353407Z  INFO ThreadId(202) refresh{client=07-tendermint-11 src_chain=cosmoshub-4 dst_chain=fetchhub-4}: [cosmoshub-4 -> fetchhub-4:07-tendermint-11] client requires refresh
Jul 26 09:47:31 Beastie hermes[1950306]: 2022-07-26T09:47:31.444498Z  INFO ThreadId(89) wait_for_block_commits: waiting for commit of tx hashes(s) 7C7D1436C667AC8BA355546E2DFEB3F004DF6C902CF8E177F7B4820A3C6BBC4E id=fetchhub-4
Jul 26 09:47:37 Beastie hermes[1950306]: 2022-07-26T09:47:37.268683Z  INFO ThreadId(202) refresh{client=07-tendermint-11 src_chain=cosmoshub-4 dst_chain=fetchhub-4}: [cosmoshub-4 -> fetchhub-4:07-tendermint-11] client requires refresh
Jul 26 09:47:37 Beastie hermes[1950306]: 2022-07-26T09:47:37.372955Z  INFO ThreadId(89) wait_for_block_commits: waiting for commit of tx hashes(s) 76E0ADD2492BECA18564C3934F3EBAACEC159491FEA41F1194D270226A9C9924 id=fetchhub-4
Jul 26 09:47:43 Beastie hermes[1950306]: 2022-07-26T09:47:43.497602Z  INFO ThreadId(202) refresh{client=07-tendermint-11 src_chain=cosmoshub-4 dst_chain=fetchhub-4}: [cosmoshub-4 -> fetchhub-4:07-tendermint-11] client requires refresh
Jul 26 09:47:43 Beastie hermes[1950306]: 2022-07-26T09:47:43.596528Z  INFO ThreadId(89) wait_for_block_commits: waiting for commit of tx hashes(s) 1E2B91BFCC0A370B022C0EC114D644C9F9C1402A3694ACFAAF6369245F6E9E08 id=fetchhub-4

Beside reporting the errors in the logs, the relayer operator who reported this suggests that the client worker should not be as stubborn:

We have had this occur a few times with different chains. Adding or increasing gas_multiplier seems to solve the issue, but it is usually after several hundred failed transactions when we notice the issue. Would be great if there was some detection / limit for these retries as clearly they won't succeed if they have already failed 10/100/500 times.

  • [ ] Hermes client worker should not discard & ignore errors, but at least report them in the logs
  • [ ] In case an error appears with the client worker refresh, consider adding a backoff window to prevent refreshing the same client too aggressively.

For operators:

  • [ ] Parametrize gas_multiplier for network fetchhub-4 to be at least 1.1, so that Hermes adjusts the estimated gas prior to submitting a transaction.

adizere avatar Aug 01 '22 14:08 adizere

If default_gas is parametrized to a sensible value, eg. 5M gas or so, then the above error would be fixed.

It should probably be < 1M because of the osmosis higher gas price for "big" transactions.

Hermes should report a warning for SDK-based chains that gas_multiplier of value 1.0 is likely going to cause "out of gas" errors.

Should we restrict it to >= 1.1?

ancazamfir avatar Aug 04 '22 11:08 ancazamfir