infrastructure
infrastructure copied to clipboard
energy monitoring of the ocaml.org cluster
As part of the process to reduce our emissions resulting from the ocaml.org infrastructure, we need to first systematically track and measure each service. This issue tracks progress towards determining our current energy expenditure, and we will then have initiatives to reduce and consolidate as appropriate.
- [x] have a machine readable list of physical/virtual machines we are operating, and their locations. We almost have this now with the information in this repository, @mtelvers?
- obtain more specific information about energy usage of the various data centres we use.
- [ ] @patricoferris and @avsm have requested access to the University of Cambridge's energy monitoring platform, for all the cluster nodes hosted there (~12 or so)
- [ ] we use Scaleway's Paris2 datacentre which has some statistics here
- [ ] Equinix Metal and Works on ARM have sustainability reports but need to find more specific information.
- [x] deploy Clarke against the Prometheus instance so we are tracking each machine's energy. @patricoferris and @mtelvers are handling this. As an aside, do we have an ocaml.org-specific instance of Grafana/Prometheus, or is it still running on status.ci.ocamllabs.io?
- [ ] publish the data on ocaml.org, and link to it from https://ocaml.org/policies/carbon-footprint
- [x] publish blog post https://github.com/ocaml/infrastructure/pull/45 and cross post to discuss.ocaml.org
Once this is done, we can have a review of the actual services and determine how to reduce the footprint.
I have added a machine-readable list of machines:machines.csv
which is built automatically as part of the GH pages website. The fields a populated from the _machines/*
YAML files. We can adjust the fields in a CSV as needed. To obtain the latest CSV run, for example, curl https://infra.ocaml.org/machines.csv
.
As an aside, do we have an ocaml.org-specific instance of Grafana/Prometheus, or is it still running on status.ci.ocamllabs.io?
It is a shared instance for the cluster.
We should also move to opam versions of clarke's dependencies https://github.com/ocurrent/clarke/issues/12 for ongoing maintenance.
@dra27 mentioned this issue https://github.com/ocaml/ocaml/pull/11903 for OCaml 5 issue might impact us having long running OCaml 5 daemons.
PR for building and pushing Clarke images https://github.com/ocurrent/ocurrent-deployer/pull/176
@patricoferris @avsm I'll leave you to post on discuss.ocaml.org regarding the infra blog post. http://infra.ocaml.org/2023/05/30/emissions-monitoring.html