neonKUBE
neonKUBE copied to clipboard
Cluster telemetry
neon-cluster-operator needs to periodically upload a few key metrics about the cluster to our headend. The update interval should be specified via a new neon-cluster-operator specific CRD that will also configure other things like root CA updates, Linux security patches, etc. This metric upload will be enabled by default, but we'll allow users to disable this via the CRD.
These metrics will include:
- [ ] Cluster hosting environment
- [ ] neonKUBE version
- [ ] client-id: identifies the client that deployed the cluster
- [ ] organization-id: identifies the organization that owns the cluster
- [ ] created-timestamp: indicates when the cluster was created (UTC)
- [ ] node information: list of information about each node including:
- [ ] cores
- [ ] memory
- [ ] storage
- [ ] architecture (AMD64/ARM64)
- [ ] VM size/type (cloud deployments only)
I think we should upload these to the headend via the new headend client and have the headend persist these to S3 for long term storage as well as persist some or all of this information to Loki as metrics that we can view in real time in Grafana.
I'm thinking that this could be our first step towards operational data pipelines. I think we should structure this at three levels:
-
Permanent S3 storage of raw collected data. This would include some information from clients like neon-desktop and neon-cli as well as cluster information forwarded by neon-cluster-operator. In the future, this could also include information about downloads, new users, revenue and purchases.
-
We'd configure an AWS Athena data lake to hold this data and then configure AWS Glue jobs to scan and index data into Athena.
-
We'd leave segment.io analytics in it's own bucket and have AWS Glue jobs scan/index that data into Athena as well.