firmware icon indicating copy to clipboard operation
firmware copied to clipboard

uplink no-tunnel does not failover within a reasonable amount of time

Open pmelange opened this issue 6 years ago • 4 comments

When a node is set up with an uplink type of no-tunnel, a problem exists when the local uplink (Fritz box) goes offline. No tests are performed on the node to check if the ffuplink interface actually has a connection to the internet. This prevents the node from switching over to the OLSR smart gw.

This has a lot to do with the DHCP lease time of the Fritz box with a power failure senario. With my tests locally, it took up to 3 times the DHCP lease time before the ffuplink interface decides it is offline. With a default DHCP lease time of 1 week on the fritz boxes, it can result in a extremely long wait before failover.

But if the Fritz box is still powered, but the DSL connection is simply not working, then even the DHCP timeout will not help.

This is not an issue with the tunnel variations because the tunnel software (openvpn and tunneldigger) is in constant communication with the server to maintain state.


The OLSR daemon checks (per default) every 30 seconds if the internet is reachable. So, even though the local clients are still being forwarded over the ffuplink interface, the smart gw announcements stop relatively quick and neighbours no not use non-functioning uplink any more.

Like the OLSR daemon, perhaps it would be wise to set up some kind of ping watchdog which will, in regular intervals, check to see if the internet is reachable.

pmelange avatar Jan 24 '19 10:01 pmelange

I have written a script which will test and take down or up the ffuplink interface accordingly. The OLSRd smart gw is active if the ping test fails on ffuplink. Otherwise, the direct route to the internet is active.

But there is still one issue which I am not sure how to resolve. If the connection to the internet via ffuplink doesn't work (FritzBox is on, but doesn't connect to the internet) the DNS queries are still being sent over WAN (which also doesn't have internet). No hostnames are resolving. I'm not sure how to get around this problem.

If anyone has any ideas how to get around the DNS issue, please comment. This is now a seperate issue (#660)

#!/bin/sh

. /lib/functions.sh

STATUS_FILE="/tmp/ffuplink_status"

[ -f "$STATUS_FILE" ] && STATUS=$(cat "$STATUS_FILE") || STATUS="up"

COMMUNITY=$(uci get freifunk.community.name)
SERVERS=$(uci get profile_${COMMUNITY}.interface.dns)

NEW_STATUS="down"
# if the old status is up, check that ffuplink is working
if [ $STATUS = "up" ]; then
  for server in $SERVERS; do
    # only test ipv4 addresses
    if [ ! $(echo "$server" | awk -F. "NF == 4") ]; then
      continue
    fi

    # ping test
    ping -c 1 -q -I ffuplink "$server" > /dev/null
    if [ $? = "0" ]; then
      NEW_STATUS="up"
      break
    fi
  done

  # take down ffuplink if the ping test failed
  if [ $NEW_STATUS = "down" ]; then
    ifdown ffuplink
  fi

# if the old status is down, check that wan is working
else
  for server in $SERVERS; do

    # only test ipv4 addresses
    if [ ! $(echo "$server" | awk -F. "NF == 4") ]; then
      continue
    fi
 
    # ping test
    ping -c 1 -q -I br-wan "$server" > /dev/null
    if [ $? = "0" ]; then
      NEW_STATUS="up"
      break
    fi
  done
 
  # bring up ffuplink if the ping test passed
  if [ $NEW_STATUS = "up" ]; then
    ifup ffuplink
  fi
fi

echo $NEW_STATUS > $STATUS_FILE

I recommend putting this in crontab and to run it every minute (maybe every two).

pmelange avatar Jan 24 '19 14:01 pmelange

The issue with DNS is relevant to all ffuplink types, therefore I have made it into a separate issue (#660)

pmelange avatar Jan 24 '19 16:01 pmelange

Do you think it would be good to integrate issue #610 into your script? Thus there would not be two scripts using ping every minute.

akira25 avatar Jan 26 '19 16:01 akira25

Please see https://github.com/freifunk-berlin/firmware/issues/660#issuecomment-466715354

pmelange avatar Feb 24 '19 00:02 pmelange