uplink no-tunnel does not failover within a reasonable amount of time
When a node is set up with an uplink type of no-tunnel, a problem exists when the local uplink (Fritz box) goes offline. No tests are performed on the node to check if the ffuplink interface actually has a connection to the internet. This prevents the node from switching over to the OLSR smart gw.
This has a lot to do with the DHCP lease time of the Fritz box with a power failure senario. With my tests locally, it took up to 3 times the DHCP lease time before the ffuplink interface decides it is offline. With a default DHCP lease time of 1 week on the fritz boxes, it can result in a extremely long wait before failover.
But if the Fritz box is still powered, but the DSL connection is simply not working, then even the DHCP timeout will not help.
This is not an issue with the tunnel variations because the tunnel software (openvpn and tunneldigger) is in constant communication with the server to maintain state.
The OLSR daemon checks (per default) every 30 seconds if the internet is reachable. So, even though the local clients are still being forwarded over the ffuplink interface, the smart gw announcements stop relatively quick and neighbours no not use non-functioning uplink any more.
Like the OLSR daemon, perhaps it would be wise to set up some kind of ping watchdog which will, in regular intervals, check to see if the internet is reachable.
I have written a script which will test and take down or up the ffuplink interface accordingly. The OLSRd smart gw is active if the ping test fails on ffuplink. Otherwise, the direct route to the internet is active.
But there is still one issue which I am not sure how to resolve. If the connection to the internet via ffuplink doesn't work (FritzBox is on, but doesn't connect to the internet) the DNS queries are still being sent over WAN (which also doesn't have internet). No hostnames are resolving. I'm not sure how to get around this problem.
If anyone has any ideas how to get around the DNS issue, please comment. This is now a seperate issue (#660)
#!/bin/sh
. /lib/functions.sh
STATUS_FILE="/tmp/ffuplink_status"
[ -f "$STATUS_FILE" ] && STATUS=$(cat "$STATUS_FILE") || STATUS="up"
COMMUNITY=$(uci get freifunk.community.name)
SERVERS=$(uci get profile_${COMMUNITY}.interface.dns)
NEW_STATUS="down"
# if the old status is up, check that ffuplink is working
if [ $STATUS = "up" ]; then
for server in $SERVERS; do
# only test ipv4 addresses
if [ ! $(echo "$server" | awk -F. "NF == 4") ]; then
continue
fi
# ping test
ping -c 1 -q -I ffuplink "$server" > /dev/null
if [ $? = "0" ]; then
NEW_STATUS="up"
break
fi
done
# take down ffuplink if the ping test failed
if [ $NEW_STATUS = "down" ]; then
ifdown ffuplink
fi
# if the old status is down, check that wan is working
else
for server in $SERVERS; do
# only test ipv4 addresses
if [ ! $(echo "$server" | awk -F. "NF == 4") ]; then
continue
fi
# ping test
ping -c 1 -q -I br-wan "$server" > /dev/null
if [ $? = "0" ]; then
NEW_STATUS="up"
break
fi
done
# bring up ffuplink if the ping test passed
if [ $NEW_STATUS = "up" ]; then
ifup ffuplink
fi
fi
echo $NEW_STATUS > $STATUS_FILE
I recommend putting this in crontab and to run it every minute (maybe every two).
The issue with DNS is relevant to all ffuplink types, therefore I have made it into a separate issue (#660)
Do you think it would be good to integrate issue #610 into your script? Thus there would not be two scripts using ping every minute.
Please see https://github.com/freifunk-berlin/firmware/issues/660#issuecomment-466715354