CCF icon indicating copy to clipboard operation
CCF copied to clipboard

Find a way to restore or replace cimetrics

Open eddyashton opened this issue 1 year ago • 2 comments

We disabled cimetrics in #6125, to avoid use of METRICS_MONGO_CONNECTION.

We should look at restoring cimetrics, either with improved auth to mongo, or by finding an alternative storage mechanism.

eddyashton avatar Apr 16 '24 13:04 eddyashton

I looks like bencher.dev may be what we want.

Test job: https://github.com/microsoft/CCF/blob/bencher_experiment/.github/workflows/bencher.yml, works on ubuntu 20.04 using a 1ES pool.

The json adapter supports both latency and throughput, and the format is very simple: https://bencher.dev/docs/explanation/adapters/#-json

I think we would move the performance tests to GHA for simplicity, and we would start with just running on main. The setup for PRs on branches seems to need a bit of care to avoid exposing secrets, but we may be able to do without it and just check periodically.

achamayou avatar May 10 '24 14:05 achamayou

pi_basic_virtual_cft

  • [x] Don't need the ^ anymore
  • [x] Can we create a Memory slug for the _mem?
  • [x] Move other end to end tests
  • [x] use high low values?
  • [x] Micro-benchmarks?

achamayou avatar May 10 '24 17:05 achamayou

Everything has been moved over now. I think there are probably two last things to explore before closing this task:

  • [x] Rename the metrics to something more sensible.
  • [x] Split the micro-benchmarks into a separate slug, since they're on a fundamentally different scale from end to end. This could still be quite generic, e.g. Rate.

Proposed renames:

  • CHAMP get (/s) -> CHAMP get

  • CHAMP put (/s) -> CHAMP put

  • KV deser (/s) -> KV deserialisation

  • KV ser (/s) -> KV serialisation

  • KV snap deser (/s) -> KV snapshot deserialisation

  • KV snap ser (/s) -> KV snapshot serialisation

  • RB get (/s) -> RBMap get

  • RB put (/s) -> RBMap put

  • commit_latency_ratio -> Commit

    • This one I think we ideally move away from ratio to real latency, and split into multiple runs by signature interval, e.g. Commit (100ms sig_ms_interval)
  • historical_queries -> Historical Queries

  • pi_basic_js_virtual -> Basic JS

  • pi_basic_mt_virtual -> Basic Multi-threaded

  • pi_basic_virtual -> Basic

  • pi_ls_jwt_virtual -> Logging JWT

  • pi_ls_virtual -> Logging

  • [x] Post throughput from pi_ls tests

achamayou avatar Jun 05 '24 13:06 achamayou

Saving the last copy of the trend plots before we drop the storage account:

image

achamayou avatar Jun 12 '24 08:06 achamayou

After review with @eddyashton, we want in addition:

  • [x] check the way MT numbers are reported, they look low
  • [x] add TPCC
  • [x] take a look at Basic too, which seems to have dropped, perhaps because of programmability

achamayou avatar Jun 12 '24 09:06 achamayou

take a look at Basic too, which seems to have dropped, perhaps because of programmability

Fiddling locally, I'm seeing ~56k with the current basic (with programmability). If I remove the DynamicJSEndpointRegistry::find_endpoint, which even with empty tables is doing a few KV lookups, I get back to 61k. I think we should fork a separate programmability app, to retain basic as a minimal perf test.

eddyashton avatar Jun 13 '24 10:06 eddyashton

@eddyashton agreed, that'd be useful for clarity, especially since further changes are coming for programmability. It's going to increase build times though, so I think we may want to find something we can sacrifice too.

achamayou avatar Jun 13 '24 11:06 achamayou