Monitoring/Stats
Erlang/OTP provides rich support for monitoring and profiling the state of a running system. This support can be enabled at any time, doesn't require special compilation and has negligible impact on runtime performance.
To what extent can we provide similar facilities for CH? Do they belong in the platform or core CH layers? What would such metrics look like an what kind of APIs would we want to expose for reading them?
See: EKG, specially the core part where gauges, counter, labels, etc can be installed and then consumed/monitored.
At first, a set of per OS process metrics and then an automatic way of aggregating the metrics across the CH cluster. I am sure Erlang may have some good role models here too.
Sounds like a good place to start. Erlang provides stats on a per erlang process (e.g., per green thread) basis. It makes sense to do that because each thread is garbage collected individually and has its own stack + heap. That might make less sense for Haskell, where garbage collection is global. Reminds me to look at whether the GHC parallel GC efforts got anywhere.
Anyway, having EKG would be a start. Need to look at what constraints this puts on how the code needs to be compiled and decide whether they're acceptable or whether that should just be optional.
We should also consider providing OS level stats via a background worker - see the erlang os_mon application for example.
Support for SNMP would also be nice.
User defined updatable performance counters would also be handy, though I wonder whether that already exists and/or should really live outside CH/Platform
Also see the issue I raised for core CH to provide useful metrics here.
And some initial process status in distributed-process issue 89 which I'm currently doing some work on.
I think the focus here will be on statistics, monitoring and subscribing to system events. There is definitely an overlap with the management API I've proposed for CH and even more so the management tools for -platform.