ansible-slurm-appliance icon indicating copy to clipboard operation
ansible-slurm-appliance copied to clipboard

Use slurmdb directly as grafana datasource

Open sjpb opened this issue 4 years ago • 8 comments

Could potentially use the mysql plugin https://grafana.com/docs/grafana/latest/datasources/mysql/ to query the slurmdb directly, rather than having to use slurm_stats/filebeat/elasticsearch, which would mean we could also ditch podman.

However slurm_stats does also do some munging of the sacct output, e.g. https://github.com/stackhpc/slurm-openstack-tools/blob/bacbf41fd276d4478c44ec5356d048db89128b67/slurm_openstack_tools/sacct.py#L72. Might be hard to replicate in grafana.

sjpb avatar Feb 10 '22 09:02 sjpb

@jovial for info.

sjpb avatar Feb 10 '22 09:02 sjpb

Useful: https://wiki.fysik.dtu.dk/niflheim/Slurm_database#slurm-database-tables

sjpb avatar Feb 10 '22 12:02 sjpb

Needs something like this adding

grafana_datasources:
  - name: slurmdb
    ds_type: mysql
    ds_url: "{{ groups['control'] | first }}" # {{ openhpc_slurmdbd_host}} for some reason doesn't need port
    database: slurm_acct_db # openhpc_slurmdbd_mysql_database
    user: slurm
    password: "{{ vault_mysql_slurm_password }}"

although I can't seem to query this ..

sjpb avatar Feb 10 '22 16:02 sjpb

See dashboard https://grafana.com/grafana/dashboards/15754/edit?pg=dashboards&plcmt=usr-upload

which looks like image

vs current slurm_stats->filebeat->opendistro one:

image

sjpb avatar Feb 15 '22 10:02 sjpb

To see the job data:

mysql -p -u slurm slurm_acct_db
describe esearch_job_table;
select * from esearch_job_table;
exit;

Prom regex are re2: https://github.com/google/re2/wiki/Syntax

sjpb avatar Jul 25 '22 10:07 sjpb

slurmdbd's "node range expressions" (from slurm.conf) , e.g. linux[0-64,128]", or "lx[15,18,32-33]. I /think/ these can be combined, e.g. a job might look like:

linux[0-64,128],lx[15,18,32-33]

Note that from the prom docs: https://prometheus.io/docs/prometheus/latest/querying/basics/:

Regex matches are fully anchored. A match of env=~"foo" is treated as env=~"^foo$".

Would have to expand the whole nodelist really, as per the current tools python

sjpb avatar Jul 25 '22 11:07 sjpb

Confirmed as of slurm 22.05.11 that the mysql database does contain the node list in the short "hostlist expression" format, e.g. slurm-v2-compute-standard-[0-1]

See https://github.com/stackhpc/slurm-openstack-tools/blob/09e347902a2603de0be05f01b8a343da23d3c330/slurm_openstack_tools/sacct.py#L73 for the steps to a) turn this into a nodelist b) turn this into a prom-compat. regular expression (i.e. join with |)

sjpb avatar Mar 21 '24 15:03 sjpb

Did some investigatory work in https://github.com/sjpb/no-opensearch which demoed it is not trivial to directly hit the mysql db. As well as the hostlist expression expansion, sacct output expands uids/gids to names and converts a load of enums into strings. Also found that running jobs don't (maybe unsurprisingly) appear to be in the DB at all.

sjpb avatar Apr 04 '24 09:04 sjpb