Polykey icon indicating copy to clipboard operation
Polykey copied to clipboard

Setting up `diagnostics` Domain for keeping track of some operational metrics

Open CMCDragonkai opened this issue 1 year ago • 0 comments

Specification

Discussion about the #634 #628 and 223c3678ebed6aad1999218d4a15581f48388963 has led to the idea of a diagnostics domain that is useful for keeping some operational metrics.

Note that I'm still of the position that operational logs, the kind that comes out of STDERR should be captured by an orchestrator and that orchestrator can do log analysis, storage, ETL, and visualisation using grafana. This stuff should not be in Polykey core. It just adds too much complexity.

However some level of internal diagnostics may be useful - especially in terms of remote debugging too. Since we have a JS runtime, it should be possible to expose some level of remote debugging, and then keep track of diagnostic statistics about various parts of PK.

A diagnostics domain can be an "internal" domain with no guarantee of API stability that allows us to throw whatever we want in. However from a security POV it's important this does not leak anything important or become a vulnerability.

Some interesting diagnostics will be about dimensionality of the system and the different domains of the system:

  1. Objects that are live
  2. Uptime of those objects
  3. Memory usage - for detecting memory leaks
  4. CPU usage... etc
  5. Exceptions

Additional context

  • #628 - audit focuses on high level events - it represents user behaviour tracking
  • https://github.com/MatrixAI/js-logger/issues/15 - might be interesting to revisit the opentracing system - the very kind of tracing that is relevant too
  • 223c3678ebed6aad1999218d4a15581f48388963 - see comments about this
  • #598 - recent memory leak debugging that took some time to discover! Has good notes about some of the remote debuggability of node runtimes.

Tasks

  1. ...
  2. ...
  3. ...

CMCDragonkai avatar Nov 17 '23 01:11 CMCDragonkai