Fix `Filecoin.StateReplay` test (or implementation)
Issue summary
The method was disabled in https://github.com/ChainSafe/forest/pull/4429. Once fixed, it should be removed from the filter lists.
Other information and links
Payload to reproduce the failure:
{"jsonrpc":"2.0","id":0,"method":"Filecoin.StateReplay","params":
[[{"/":"bafy2bzacebjderqonlhtcp42rxldzndqm7tkfhuq62qauqseig2kur4tq6qz4"},{"/":"bafy2bzaceaaro3br2ytfoxhjp5fh26b6u6w4uzy6hvx5g3vzkfb6v3twjjtmk"},{"/":"bafy2bzacebqp4axkcpphgt36hhufcjqni6q62ggjfrn2iafjzsjdjdbwyzdby"}],{"/":"bafy2bzaceaekui2v37hfn54wgijzquegmwfwthug4cp3fvvgidpzbdvepthje"}]
}
the only sensible diff:
2024-06-20T07:33:51.7767290Z "MsgRct": {
2024-06-20T07:33:51.7767298Z
2024-06-20T07:33:51.7767551Z - "ExitCode": 0, \\ forest
2024-06-20T07:33:51.7767559Z
2024-06-20T07:33:51.7767717Z + "ExitCode": 10, \\ lotus
2024-06-20T07:33:51.7767725Z
2024-06-20T07:33:51.7767899Z "Return": null,
2024-06-20T07:33:51.7767907Z
2024-06-20T07:33:51.7768067Z "ReturnCodec": 0
2024-06-20T07:33:51.7768075Z
2024-06-20T07:33:51.7768208Z },
The only sensible diff(part of FVM execution trace) comes from FVM. The root cause might be that forest is using [email protected] while lotus-v1.17 is using [email protected] via filecoin-ffi The issue might be automatically resolved when both forest and lotus switch to [email protected]
@LesnyRumcajs, do you think it would be a proper mitigation if we just ignore the ExitCode field for now? Since there's very little we can do about the only diff in responses in this failure.
Note on exit code 10:
/// An internal VM assertion failed.
pub const SYS_ASSERTION_FAILED: ExitCode = ExitCode::new(10);
Note2: the exit code is 0 (success) for forest on both [email protected] and [email protected]. It might be a regression in [email protected] but it takes non-trival effort to verify this in forest with [email protected]
@hanabi1224 I am not convinced this is a proper fix. The goal of those tests is to ensure response parity between Forest and Lotus at specific versions, and not partial parity. I'd rather the test either fails or not, be it from a single byte diff or the entire payload. If there is no parity between Forest and Lotus at one point (like now), we must know it.
@LesnyRumcajs For this particular RPC method, the response is so large(168kb) and complicated that I have to attach the files, everything matches except this exit_code field. We could also wait for a lotus release with [email protected] and check again
I'd rather we wait for a Lotus FVM bump and if it still doesn't match, we investigate it further.
On a side note, did you try it with a Lotus from ~1 month ago, before they bumped the FVM?
On a side note, did you try it with a Lotus from ~1 month ago, before they bumped the FVM?
@LesnyRumcajs good point, I will do that tomorrow
Interestingly, I tried with a local build of Lotus 1.27.0([email protected]), and the exit code is also 10, which differs from the forest.
Some feature flags, FVM options?
Can't reproduce it on a recent snapshot. Do we have one available for testing?
forest-tool api compare --lotus /ip4/127.0.0.1/tcp/1234/http --forest /ip4/127.0.0.1/tcp/2345/http forest_snapshot_calibnet_2025-01-29_height_2361215.forest.car.zst --filter StateReplay -n 1800
| RPC Method | Forest | Lotus |
|----------------------------|--------|-------|
| Filecoin.StateReplay (316) | Valid | Valid |
@elmattic, let's mark it as Filecoin.StateReploy as fixed and re-open it with an offending snapshot uploaded if it happens again.
Note that we've updated our shim here to use V4 for exit codes, so that might have fixed something.
https://github.com/ChainSafe/forest/pull/4991/