fgci-ansible icon indicating copy to clipboard operation
fgci-ansible copied to clipboard

InstallaPrevioustion fails if host clock is off

Open jabl opened this issue 7 years ago • 5 comments

Installation of compute nodes fail if the clock is off. We had one node where the HW clock for some reason was set to 2008, and thus yum failed because the EPEL repo https cert was issued after that date. And then due to that ansible-pull-script failed as well and the node was left in a half-installed state.

Perhaps a solution would be to run "ntpdate {{ ntp_config_servers[0] }}" in the kickstart post script?

jabl avatar Jan 26 '17 08:01 jabl

Sounds reasonable to me. Should we also use the hwclock command to set the BIOS time?

There is apparently a | random filter that we could use here to be nicer to NTP. http://docs.ansible.com/ansible/playbooks_filters.html#random-number-filter

martbhell avatar Jan 26 '17 16:01 martbhell

Ntpdate is one UDP packet. The load is minimal.

Hwclock is a good idea

Sent from my Huawei Mobile

-------- Original Message -------- Subject: Re: [CSC-IT-Center-for-Science/fgci-ansible] Installation fails if host clock is off (#181) From: Johan Guldmyr To: CSC-IT-Center-for-Science/fgci-ansible CC: Subscribed

Sounds reasonable to me. Should we also use the hwclock command to set the BIOS time?

There is apparently a | random filter that we could use here to be nicer to NTP. http://docs.ansible.com/ansible/playbooks_filters.html#random-number-filter

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/CSC-IT-Center-for-Science/fgci-ansible/issues/181#issuecomment-275428044

tiggi avatar Jan 26 '17 16:01 tiggi

On 2017-01-26 18:04, Johan Guldmyr wrote:

Sounds reasonable to me. Should we also use the hwclock command to set the BIOS time?

There is apparently a | random filter that we could use here to be nicer to NTP. http://docs.ansible.com/ansible/playbooks_filters.html#random-number-filter

The kickstart file already has this:

{% if kickstart_pre_option is defined %} {% if kickstart_extra_pre_commands is defined %} ################################################################################ {{ kickstart_pre_option }} {{ kickstart_extra_pre_commands }} %end {% endif %} {% endif %}

So definine kickstart_pre_option and ntpdate time.mikes.fi + hwclock -w as kickstart_extra_pre_commands

Doing it in ansible on the host is a bit late..

However: Is this one of the SL390G7 boxes? Then it has probably lost its RTC and is basically junk. It will get a random time every time it boots.

-- Ulf Tigerstedt || Senior systems specialist CSC Oy || NeIC NT1 / NDGF GSM +358503818558 Johannesbergsvägen 17 || Närpes || Finland

tiggi avatar Jan 26 '17 21:01 tiggi

The node where I saw this was a Westmere node, yes. So it's likely that it has a dead RTC battery. So it's likely this won't help then. Shame to junk an otherwise functioning node just due to this, though.

Maybe run ntpdate from rc.local, somewhat ugly though..

jabl avatar Jan 27 '17 13:01 jabl

Is the rc.local for where the hwclock goes off on every boot?

From "systemd-analyze plot" it seems that chronyd starts before slurmd. Is chronyd not adjusting the clock? We have stepping enabled with "makestep 10 3".

martbhell avatar Jan 31 '17 05:01 martbhell