optimism op-challenger memory leak

The memory usage of op-challenger is gradually increasing the longer it runs. While the rate of increase is slow and going to cause any actual issues, particularly for networks that have been running fault proofs for a fair while like op-sepolia, its expected the system should be in a fairly steady state.

We should investigate the increase in memory and either understand why it's happening (and that it will flatten out given more time) or fix whatever memory leak is causing it.

Jun 25 '24 01:06 ajsutton

:), Is it fixed?

Jul 16 '24 10:07 lenny0x

Not yet, but it's very slow so not a threat.

Jul 16 '24 23:07 ajsutton

Interestingly, the memory usage does actually flatten out after about 4 weeks or running. The game window is 28 days so I wonder if games which are completed when challenger starts use less memory than ones that are initially in progress and then later complete. The player created for already complete games is a minimal instance: https://github.com/ethereum-optimism/optimism/blob/db61d2bbee7f36c20e778c3b9f2eef68b4fad8c7/op-challenger/game/fault/player.go#L92-L105

Potentially we could replace the player.act value with a do-nothing version once the game is completed which would allow the agent, trace provider and other components to be released. The one risk (which applies at startup too) is if there's a L1 reorg and the game is "uncompleted" again later. It wouldn't be playable as the chess clocks would have all expired but it may required a new transaction be sent to resolve the game. Would be very rare for the existing resolve tx to not just get included on the fork at some point though.

Sep 10 '24 00:09 ajsutton

Turns out we're already caching the game status so replacing the act function with a noop won't make any difference.

Sep 10 '24 00:09 ajsutton