at_server
at_server copied to clipboard
Unable to send notifications after NFS failure (@visual61 & @kryz_9850)
Describe the bug
After we had a NFS failure today these two atSigns could not send notifications. The log files looked fine and I was able to authenticate to the atServer and notifications were accepted by the secondary but never sent on to the recepient atSign.
This was tested by using at_talk before restarting the atSigns atServers.
attalk was able to log in to the atServer and would accept notifications but they were never receieved.
After restarting the atServer sending and receiving came back. I restarted visual61 and it could send and kryz_9850 could receive but could not send until it was also restarted.
Hopefully other atServers are not effected but these atServers are busy all the time and both atServers were effected. Interestingly the atGPS demo was un effeted and kryz_9850 sending to visual61 was not effceted in the first outage at 1:46 but was taken down on the 4PM outage see screen shots..
All client apps also had to be restarted before the notifications came through which was also ugly.
As soon as the atServer came back online there was a huge flurry of notifications so it would seem none were lost.
This could be a corner case or a major issue ??
Steps to reproduce
Unknown
Expected behavior
This should never happen
Additional context
KRYZ Graphs (stats from kryz_9850 to visual61)
GCP Secondaries load average during the NFS event
Possible post-mortem item?