at_server icon indicating copy to clipboard operation
at_server copied to clipboard

Unable to send notifications after NFS failure (@visual61 & @kryz_9850)

Open cconstab opened this issue 1 year ago • 1 comments

Describe the bug

After we had a NFS failure today these two atSigns could not send notifications. The log files looked fine and I was able to authenticate to the atServer and notifications were accepted by the secondary but never sent on to the recepient atSign.

This was tested by using at_talk before restarting the atSigns atServers.

attalk was able to log in to the atServer and would accept notifications but they were never receieved.

After restarting the atServer sending and receiving came back. I restarted visual61 and it could send and kryz_9850 could receive but could not send until it was also restarted.

Hopefully other atServers are not effected but these atServers are busy all the time and both atServers were effected. Interestingly the atGPS demo was un effeted and kryz_9850 sending to visual61 was not effceted in the first outage at 1:46 but was taken down on the 4PM outage see screen shots..

All client apps also had to be restarted before the notifications came through which was also ugly.

As soon as the atServer came back online there was a huge flurry of notifications so it would seem none were lost.

This could be a corner case or a major issue ??

Steps to reproduce

Unknown

Expected behavior

This should never happen

Additional context

KRYZ Graphs (stats from kryz_9850 to visual61) Screenshot 2023-03-02 at 22 34 33

GCP Secondaries load average during the NFS event image

cconstab avatar Mar 03 '23 06:03 cconstab

Possible post-mortem item?

ksanty avatar Mar 06 '23 16:03 ksanty