nearcore icon indicating copy to clipboard operation
nearcore copied to clipboard

feat: Replay IO traces on estimator for statistics

Open jakmeier opened this issue 1 year ago • 1 comments

This introduces the ability to replay IO traces on the estimator. Example usage in runtime/runtime-params-estimator/README.md.

While it is not used for estimations, yet, it is already a subcommand of the estimator. The idea is that for #7440 a lot of code can be reused. Further, replaying on the estimator is required for #7058 and #7059.

jakmeier avatar Aug 19 '22 18:08 jakmeier

This is mostly code I used already for 1-2 months for performance debugging for a variety of different problems. Actually, it's only a selected subset of the code, pulled out to (finally) create a PR that introduces a IO trace replaying mechanism to the estimator. The code I didn't include is not ready, yet. And I realised I will probably not get around finishing that code any time soon. So it seems to be better to have the part that is ready reviewed and merged.

@matklad I picked you as a reviewer, hoping you might have some ideas on how to add tests. (So far there are none.) I was thinking about adding a test trace in a separate file and maybe use snapshot testing to check the output for each of the available commands. But I am not really sure.

Also, this is not the highest of priority to get it merged IMO. Feel free to put it aside until you have a moment to look though it, as it is a relatively large PR introducing a completely new concept to the estimator.

jakmeier avatar Aug 19 '22 19:08 jakmeier

Running these on hard-coded example seems like the right way to test it! You'll want to redirect output from stdout to a generic dyn Write to make this nice though.

matklad avatar Aug 25 '22 17:08 matklad

Updated this with tests and improved the output a bit. I think it's ready for a complete review now.

@matklad do you want to review this? Or should I ask someone more from the storage team?

jakmeier avatar Sep 30 '22 17:09 jakmeier

Or should I ask someone more from the storage team?

Good call! I'd be more than happy to not learn more about this infra :D

matklad avatar Oct 03 '22 11:10 matklad

Hey @Longarithm, do you have some time to review this? It's not urgent at all, this should be a very low priority review. But it is a tool set that was really useful for me during Sweatcoin analysis and I think we should merge it in, perhaps add some documentation, and hopefully then it becomes useful for more people.

Quick overview:

  • neard view-state has an option --record-io-trace since #7052
  • IO traces are the input to this tool, which is part of the estimator (I have plans of using such traces for IO gas estimation)
  • This PR includes real IO traces as test input, e.g. runtime/runtime-params-estimator/res/75220100-75220101.s0.io_trace
  • The new tool reads the IO traces and summarizes statistics from it
  • Several commands are available. Test output for each command is available.
    • ReceiptDbStats, per receipt DB access counts, runtime/runtime-params-estimator/src/snapshots/runtime_params_estimator__replay__tests__ReceiptDbStats-75220100-75220101.s0.io_trace.snap
    • Similar naming scheme for all other commands (ChunkDbStats, CacheStats, ReceiptCacheStats, ChunkCacheStats, GasCharges)
    • GasCharges can be interesting to point out, it checks DB accesses that we charge gas for vs DB accesses that are "free" because they are outside function calls

jakmeier avatar Oct 04 '22 11:10 jakmeier

Given our plans on how to benchmark flat storage, the priority of this has increased. We will need this as a base for replaying IO traces on RocksDB. (The code for replay on RocksDB is not included here but it builds on the same infrastructure.)

@Longarithm Could you please review it some time this week? Or if you don't have time let me know and I will look for someone else to review it.

jakmeier avatar Oct 11 '22 07:10 jakmeier

We also need someone from nearcore-codeowners to get an approval for merge.

Longarithm avatar Oct 14 '22 12:10 Longarithm

@Longarithm I changed how the account filter works substantially. Most importantly, data that belongs to no account will be excluded from the reports when an account filter is active. For that, I inverted the skipping behavior. Instead of skipping every account that is not in the filter list, we now skip by default and enable statistics collection once we find a matching account. This makes much more sense to me.^^ Hopefully that makes the code clearer as well.

Also added tests for this.

jakmeier avatar Oct 17 '22 13:10 jakmeier