aprsc icon indicating copy to clipboard operation
aprsc copied to clipboard

Uplink address resolving failure seems blocking any new uplink connection

Open snip opened this issue 2 years ago • 3 comments

Hi,

on one of our aprsc server we got multiplre time the following error: 2022/01/03 13:28:06.402512 aprsc[1762:7f4661de2700] INFO: Uplink xxx: address resolving failure of 'xxx' '10152': System error after this, aprsc get lot of time the same error and never successfully connect back to its uplink. If i restart aprsc it immediately connect to its uplink without any issue.

This issue occured just after server boot and first uplink connection try (i do not know for previous occurence).

Concerned aprsc is running aprsc 2.1.10-gd72a17c.

Any idea?

Thanks

snip avatar Jan 04 '22 10:01 snip

Hi,

No immediate ideas, I haven't seen or heard this happening elsewhere. Perhaps somehow the DNS resolver stub library was incorrectly configured when aprsc started up (resolv.conf, nsswitch.conf, etc), and then configured correctly later on and then the configuration was reread when aprsc was restarted. Can you reproduce the problem with a new reboot?

I see you emailed me the full log, I'll take a look at it.

  • Hessu

hessu avatar Jan 12 '22 23:01 hessu

Hi @hessu

we encountered this issue at least 3 times. For today occurence i restarted the VM (not only aprsc), after restartart issue was still there. Then restarting aprsc fixed the issue.

I tried multiple things to reproduce this at home without any success.

So i added more uplinks in the config files specifying IP of uplinks and not the the hostname. After a reboot all seems to be correct. (in the logs i can see that it is still failing resolving uplinks upstreams but when it came to upstreams specified with IP it was able to connect without any issues).

More information regarding this server facing the issue:

This server is running Ubuntu 18.04.6 LTS.

/etc/resolv.conf

nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 127.0.0.53

/etc/nsswitch.conf

passwd:         compat systemd
group:          compat systemd
shadow:         compat
gshadow:        files

hosts:          files mdns4_minimal [NOTFOUND=return] dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

snip avatar Jan 15 '22 16:01 snip

Hi @snip,

Is /etc/resolv.conf generated at boot time somehow? Perhaps it's got incorrect contents, or it is empty, when aprsc starts up, and then gets filled up. The boot order in systemd unit file would then need to be adjusted to fix this. Is this a cloud / container environment or such?

After reboot, when the problem is visible, it should be possible to check the modification time of resolv.conf with "stat /etc/resolv.conf" - check the Modify: timestamp, and the process start-up time of aprsc, from aprsc.log.

hessu avatar May 08 '22 13:05 hessu

Hi,

I accidentally bumped into a description of a glibc bug, where resolv.conf is not reloaded after it changes. Debian-based distributions have had a patch present in glibc to work around this already for some time.

If the system comes up without a working resolver config, and aprsc starts before that, and resolv.conf is then updated to contain correct DNS service config a few seconds later, the bug would surely work like you described. A restart of aprsc would get it to reread resolv.conf and DNS resolution would work.

I added a workaround in aprsc code now to reinit the stub resolver if resolv.conf changes, it'll be present in the next release.

hessu avatar Oct 23 '22 22:10 hessu