cosmos-sdk icon indicating copy to clipboard operation
cosmos-sdk copied to clipboard

If gas_used > gas_wanted, set gas_used = gas_wanted + 1

Open ValarDragon opened this issue 2 years ago • 5 comments

Summary

Currently the SDK is over-precise in gas counting, in the case where gas_used > gas_wanted. This leads to the order of independent store reads mattering: https://twitter.com/valardragon/status/1554117537230618624?s=20&t=A4bNm3hVvXrfD1ystSh_YQ , meaning you cannot fearlessly reorder function1 and function2 below:

func someInternalMethod(ctx sdk.Context) {
  object1 := readOnlyFunction1(ctx) # consumes 1000 gas
  object2 := readOnlyFunction2(ctx) # consumes 500 gas
  doStuff(ctx, object1, object2)
}

The issue is that suppose my gas limit is 2000, and I've already used 1600 gas before entering someInternalMethod. Right now I will out of gas with gasUsed = 2600 (2600 getting merkelized into the tx results)

If I reorder the methods of someInternalMethod to:

func someInternalMethod(ctx sdk.Context) {
  object2 := readOnlyFunction2(ctx) # consumes 500 gas
  object1 := readOnlyFunction1(ctx) # consumes 1000 gas
  doStuff(ctx, object1, object2)
}

Then I have introduced a state machine incompatibility in the current behavior! This is because I will now out of gas with gasUsed = 2100 (2100 getting merkelized into the tx results)

This is over-precise!

Proposal

I suggest that in the tx results we send over ABCI, we change the logic to be such that:

if gas_used > gas_wanted {
  txresult.GasUsed = gas_wanted + 1
}

This means that we no longer incur this over-precision in gas used amounts. The amount by which gas used is greater than gas wanted isn't really helpful, since what you'd want for debugging is seeing what the entire tx would take, not whatever random intermediate operation we stopped in.

Ideally we wouldn't use gas_wanted + 1, and instead use some out of gas sentinel value, but that requires Tendermint changes + ecosystem integration changes (which should be done, but aren't as quick of a fix)

ValarDragon avatar Aug 01 '22 17:08 ValarDragon

i actually don't like this idea. the precise gas amount can be a useful data for debugging in some cases. forcibly setting gas_used = gas_wanted + 1 basically discards this data.

the drawback for being precise here, as we have seen, is that every time developers reorder some functions, they must make sure to tag it as a breaking change. i'm personally ok with this

larry0x avatar Aug 01 '22 17:08 larry0x

When is it helpful in debugging flows to see the exact operation at which gas used exceeded gas wanted? Whenever I debug gas usage, I see how much gas the message takes with unbounded gas wanted, and then profile where gas was used. (I currently do this via a hacky mechanism, we should make dev tooling for that to be better)

The exact point above the edge is easily debuggable given that trace. I'd rather put focus on emitting more useful gas traces, and getting that into explorer displays, rather than preserve the edge behavior

We can also emit an event in these cases, so it will be easily inspectable to see the precise amount in your data integration layer, as it is today. So amend the logic to be, and then we can see it in CLI + mintscan

if gas_used > gas_wanted {
  oldGasUsed := txresult.GasUsed
  txresult.GasUsed = gas_wanted + 1
  txresult.Events = append(txresult.Events, sdk.NewEvent("too much gas used", fmt.Sprintf("overflow_point: %d", oldGasUsed))
}

ValarDragon avatar Aug 01 '22 17:08 ValarDragon

I finally understand what you meant now haha. So the purist in me says, "Hey, this should be caught in review and is not state-machine compatible!", but I know that's pretty much not practical and very difficult to spot during reviews.

So I do see merit in your suggestion. But such a change definitely warrants a large community buy-in here I think as it impacts everyone.

alexanderbez avatar Aug 01 '22 21:08 alexanderbez

I think its questionable that Tendermint ever merkelized gas_used.

Gas usage is an application level concern, it should only matter as an application property. Gas wanted in today's Tendermint affects block building, gas used will affect application based mempools. Other application usages of gas_used are tx refunds - but again this has much lower sensitivity requirements.

gas_used getting merkelized makes sense iff clients were to light query for this, and we didn't want to make a better flow for how light clients query.

But even then, I don't think it makes sense for light clients to query for the gas_used_ , thats too low level, instead they should query for tx response code. (Success, fail due to out of gas, fail due to other app logic, etc.)

ValarDragon avatar Aug 01 '22 22:08 ValarDragon

Yeah I totally agree with all your points re gasUsed being sent to Tendermint. Just the change proposed I'd like more opinion or buy-in from other folks. Let's bring it up during the architecture call.

alexanderbez avatar Aug 02 '22 13:08 alexanderbez