infrastructure energy monitoring of the ocaml.org cluster

As part of the process to reduce our emissions resulting from the ocaml.org infrastructure, we need to first systematically track and measure each service. This issue tracks progress towards determining our current energy expenditure, and we will then have initiatives to reduce and consolidate as appropriate.

[x] have a machine readable list of physical/virtual machines we are operating, and their locations. We almost have this now with the information in this repository, @mtelvers?
obtain more specific information about energy usage of the various data centres we use.
- [ ] @patricoferris and @avsm have requested access to the University of Cambridge's energy monitoring platform, for all the cluster nodes hosted there (~12 or so)
- [ ] we use Scaleway's Paris2 datacentre which has some statistics here
- [ ] Equinix Metal and Works on ARM have sustainability reports but need to find more specific information.
[x] deploy Clarke against the Prometheus instance so we are tracking each machine's energy. @patricoferris and @mtelvers are handling this. As an aside, do we have an ocaml.org-specific instance of Grafana/Prometheus, or is it still running on status.ci.ocamllabs.io?
[ ] publish the data on ocaml.org, and link to it from https://ocaml.org/policies/carbon-footprint
[x] publish blog post https://github.com/ocaml/infrastructure/pull/45 and cross post to discuss.ocaml.org

Once this is done, we can have a review of the actual services and determine how to reduce the footprint.

Jan 03 '23 14:01 avsm

I have added a machine-readable list of machines:machines.csv which is built automatically as part of the GH pages website. The fields a populated from the _machines/* YAML files. We can adjust the fields in a CSV as needed. To obtain the latest CSV run, for example, curl https://infra.ocaml.org/machines.csv.

Jan 09 '23 14:01 mtelvers

As an aside, do we have an ocaml.org-specific instance of Grafana/Prometheus, or is it still running on status.ci.ocamllabs.io?

It is a shared instance for the cluster.

We should also move to opam versions of clarke's dependencies https://github.com/ocurrent/clarke/issues/12 for ongoing maintenance.

Feb 28 '23 06:02 tmcgilchrist

@dra27 mentioned this issue https://github.com/ocaml/ocaml/pull/11903 for OCaml 5 issue might impact us having long running OCaml 5 daemons.

Feb 28 '23 23:02 tmcgilchrist

PR for building and pushing Clarke images https://github.com/ocurrent/ocurrent-deployer/pull/176

Mar 22 '23 22:03 patricoferris

@patricoferris @avsm I'll leave you to post on discuss.ocaml.org regarding the infra blog post. http://infra.ocaml.org/2023/05/30/emissions-monitoring.html

Jun 06 '23 01:06 tmcgilchrist

infrastructure infrastructure copied to clipboard

energy monitoring of the ocaml.org cluster

infrastructure
infrastructure copied to clipboard