Steve Brasier

Results 155 comments of Steve Brasier

So full solution: - Persist hostkeys for login e.g. systemd unit save/recover to `/home/root/` - Fix internal IP for login (so either this is a persistent port, or a reserved...

Fixed in https://github.com/stackhpc/ansible-slurm-appliance/commit/252ff349b6e63aadaa27681cd4dd7f34b50acb59

Seen on alaska (on arcus), reproduced on smslabs

Reproduced on arcus. This kernel module doesn't exist: ``` [root@dev-compute-0 rocky]# ls /lib/modules/$(uname -r)/kernel/arch/x86/kernel/cpu/cpufreq/ ``` tried ``` yum install cpupowerutils ``` ``` [root@dev-compute-0 rocky]# find /lib/modules -type f -iname "*freq*"...

ohpc dashboard uses `node_cpu_scaling_frequency_hertz` stats Based on https://superuser.com/questions/1624080/why-there-is-no-cpufreq-under-sys-devices-system-cpu-cpu0 ``` [rocky@cpuinfo-compute-0 ~]$ curl http://localhost:9100/metrics | grep node_cpu_scaling_frequency_hertz [rocky@cpuinfo-compute-0 ~]$ cat /boot/config-$(uname -r) | grep CONFIG_CPU_FREQ CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_GOV_ATTR_SET=y CONFIG_CPU_FREQ_GOV_COMMON=y CONFIG_CPU_FREQ_STAT=y CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y #...

See https://wiki.stackhpc.com/doc/cpu-frequency-RI0IAojfQ7 for more debugging.

Looks like it might be this: https://github.com/vpenso/prometheus-slurm-exporter/issues/55 although this says for >20 chars. Currrent nodename is `devrebuild-control`. Will try shorter `devrev` and see ...

Useful: https://wiki.fysik.dtu.dk/niflheim/Slurm_database#slurm-database-tables