cloudstate icon indicating copy to clipboard operation
cloudstate copied to clipboard

Observability story

Open viktorklang opened this issue 5 years ago • 2 comments

Define what metrics can and should be exposed by the platform.

viktorklang avatar Mar 28 '19 15:03 viktorklang

I'd say we should leverage other solutions for metrics collection as much as possible. For example, allow Knative and/or Istio to collect metrics on requests. We should only supply metrics to fill the gaps, including:

  • Database access times, for reads and writes
  • Snapshot read/write times
  • Active entity hit ratio
  • Active entities
  • Active entity lifetime (ie, how long an entity lives before it gets passivated)
  • Recovery event counts
  • Entity shard distribution (ie, number of entities per shard)
  • Local shard hit ratio (be 1:n where n is the number of nodes, but if we try and implement shard affinities with the load balancer, then could be different).
  • User function latency

jroper avatar Jul 09 '19 02:07 jroper

Telemetry for event-sourced entities in #349, which covers persistence and entity metrics.

pvlugter avatar Jun 30 '20 03:06 pvlugter