CCF Find a way to restore or replace cimetrics

We disabled cimetrics in #6125, to avoid use of METRICS_MONGO_CONNECTION.

We should look at restoring cimetrics, either with improved auth to mongo, or by finding an alternative storage mechanism.

Apr 16 '24 13:04 eddyashton

I looks like bencher.dev may be what we want.

Test job: https://github.com/microsoft/CCF/blob/bencher_experiment/.github/workflows/bencher.yml, works on ubuntu 20.04 using a 1ES pool.

The json adapter supports both latency and throughput, and the format is very simple: https://bencher.dev/docs/explanation/adapters/#-json

I think we would move the performance tests to GHA for simplicity, and we would start with just running on main. The setup for PRs on branches seems to need a bit of care to avoid exposing secrets, but we may be able to do without it and just check periodically.

May 10 '24 14:05 achamayou

pi_basic_virtual_cft

[x] Don't need the ^ anymore
[x] Can we create a Memory slug for the _mem?
[x] Move other end to end tests
[x] use high low values?
[x] Micro-benchmarks?

May 10 '24 17:05 achamayou

Everything has been moved over now. I think there are probably two last things to explore before closing this task:

[x] Rename the metrics to something more sensible.
[x] Split the micro-benchmarks into a separate slug, since they're on a fundamentally different scale from end to end. This could still be quite generic, e.g. Rate.

Proposed renames:

CHAMP get (/s) -> CHAMP get
CHAMP put (/s) -> CHAMP put
KV deser (/s) -> KV deserialisation
KV ser (/s) -> KV serialisation
KV snap deser (/s) -> KV snapshot deserialisation
KV snap ser (/s) -> KV snapshot serialisation
RB get (/s) -> RBMap get
RB put (/s) -> RBMap put
commit_latency_ratio -> Commit
- This one I think we ideally move away from ratio to real latency, and split into multiple runs by signature interval, e.g. Commit (100ms sig_ms_interval)
historical_queries -> Historical Queries
pi_basic_js_virtual -> Basic JS
pi_basic_mt_virtual -> Basic Multi-threaded
pi_basic_virtual -> Basic
pi_ls_jwt_virtual -> Logging JWT
pi_ls_virtual -> Logging
[x] Post throughput from pi_ls tests

Jun 05 '24 13:06 achamayou

Saving the last copy of the trend plots before we drop the storage account:

Jun 12 '24 08:06 achamayou

After review with @eddyashton, we want in addition:

[x] check the way MT numbers are reported, they look low
[x] add TPCC
[x] take a look at Basic too, which seems to have dropped, perhaps because of programmability

Jun 12 '24 09:06 achamayou

take a look at Basic too, which seems to have dropped, perhaps because of programmability

Fiddling locally, I'm seeing ~56k with the current basic (with programmability). If I remove the DynamicJSEndpointRegistry::find_endpoint, which even with empty tables is doing a few KV lookups, I get back to 61k. I think we should fork a separate programmability app, to retain basic as a minimal perf test.

Jun 13 '24 10:06 eddyashton

@eddyashton agreed, that'd be useful for clarity, especially since further changes are coming for programmability. It's going to increase build times though, so I think we may want to find something we can sacrifice too.

Jun 13 '24 11:06 achamayou