wifi-connect icon indicating copy to clipboard operation
wifi-connect copied to clipboard

Delay needed before wifi_device.get_access_points (on Raspberry Pi 3)

Open kneh opened this issue 3 years ago • 6 comments

On raspberry pi 3 (onboard wifi) I repeatitly could not connect to my selected (from the web ap) AP because the access point list was requested to fast after a change in configuration. I think that's a bug in the network manager (library). I added a delay of 10s (see diff below) and now it always works. Would it not be a good idea to make this a command-line option for as long as this isn't fixed by the network manager?

diff --git a/src/network.rs b/src/network.rs index fee31c5..7946cdc 100644 --- a/src/network.rs +++ b/src/network.rs @@ -365,6 +365,7 @@ fn get_access_points_impl(device: &Device) -> Result<Vec<AccessPoint>> { // After stopping the hotspot we may have to wait a bit for the list // of access points to become available while retries < retries_allowed { + ::std::thread::sleep(::std::time::Duration::from_secs(10)); let wifi_device = device.as_wifi_device().unwrap(); let mut access_points = wifi_device.get_access_points()?;

kneh avatar Mar 16 '21 20:03 kneh

Hello, I think I'm having the same issue. Forgive my noobie question, but it's unclear to me how to add a 10sec delay. Could I put a delay in the wifi-connect-start.service script?

romarcin avatar Mar 23 '21 17:03 romarcin

If it's the same issue, then I don't think so, it's this rust code that is calling the network-manager functions too fast. For me it was the first time I had to work with rust code. I had to add the delay to the beginning of the while loop in the get_access_points_impl function. Then I had to recompile the rust code.

cargo build --release

Kind regards,

Henk

On Tue, Mar 23, 2021 at 6:06 PM romarcin @.***> wrote:

Hello, I think I'm having the same issue. Forgive my noobie question, but it's unclear to me how to add a 10sec delay. Could I put a delay in the wifi-connect-start.service script?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/balena-io/wifi-connect/issues/383#issuecomment-805073176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE3UUEAFRJNXACOJGPOY2DTFDDAHANCNFSM4ZJHNDGA .

kneh avatar Mar 23 '21 20:03 kneh

Thank you for your reply. It was the delay. Thanks again. Cheers

romarcin avatar Mar 24 '21 15:03 romarcin

I think this is need because I can't get to other networks if already connected to one that's exiting in the current pool (think same building, different network name). It only gives the first network or the last one it connected to. Please see ticket

jsib0 avatar Apr 15 '21 21:04 jsib0

Thanks for this solution @kneh ! Unfortunately, it did not work for me. Tried various delays, but the accesspoint-list still sometimes contains only the connected network. I also think it is an issue in the NetworkManger, since the same sometimes happends with nmcli d wifi, see https://askubuntu.com/questions/1193642/output-of-nmcli-device-wifi-list-is-often-nearly-empty-on-ubuntu-18-04 .

I think the best solution is to just restart the hotspot when this issue occurs, so I ended up with a check on the output of wifi-connect . At some point the logs contains a list with the accesspoints. I simply made a python script that checks this list: if it is empty or contains only one ssid, then restart wifi-connect. Works like a charm!

import os
import pty
import subprocess
from datetime import time
from threading import Thread


# config
maxHotspotRetries = 5

# runtime vars
hotspotFailedCounter = 0


def getHotspotName():
    return "hotspotname"


def runHotspot():
    # remove old hotspotconnection, to make sure it won't conflict with the new hotspot
    os.system('nmcli connection down id "'+getHotspotName()+'" || true')
    os.system('nmcli connection delete id "'+getHotspotName()+'" || true')


    # we need to use pseudoterminal since wifi-connect outputs to tty, thus we can't read stdout
    # -> READ THIS: https://stackoverflow.com/a/42008071
    master, slave = pty.openpty()
    hotspotProcess = subprocess.Popen(["/usr/src/app/wifi-connect", "-s", getHotspotName()], stdin=subprocess.PIPE,
                                      stdout=slave, stderr=slave, close_fds=True)
    # the code above duplicates the slaveFD for use with the hotspotProcess, so both slaves need to be closed to make sure os.read (see below) stops (with OSError).
    os.close(slave)


    readLine = ""
    while True:
        try:
            read = str(os.read(master, 1), 'utf-8')
        except OSError: # the process has ended
            break
        if read is not None:
            if read == '\n':
                print(readLine)

                if readLine.find("User connected to the captive portal") != -1:
                    # we can do something like fire an event: userHasInteractionWithHotspotEvent()
                    print("user connected")


                # because of some strange bug of NMCLI sometimes only the connected wifi-network is returned.
                # If that is the case: restart
                if readLine.find("Access points") != -1:
                    if readLine.count(',') == 0: # zero or one ssids found, not good: restart hotspot
                        global hotspotFailedCounter
                        global maxHotspotRetries

                        if hotspotFailedCounter < maxHotspotRetries:
                            hotspotFailedCounter = hotspotFailedCounter + 1
                            print("zero or one ssids found; not good: restart hotspot")
                            time.sleep(5) # add a delay, so that the hotspot is started first; this may trigger a rescan when the hotspot is stopped again
                            hotspotProcess.terminate() # stop the hotspot process
                            print("process stopped, restarting in 5 seconds")
                            time.sleep(5)
                            t = Thread(target=runHotspot) # restart this function in new thread
                            t.start() 
                            return # end this instance of runHotspot
                        else:
                            # if still the case after maxHotspotRetries -> continue, there is probably only one ssid
                            print("not many hotspots found, but continue after "+str(maxHotspotRetries)+" retries")
                        hotspotFailedCounter = 0


                readLine = ""
            else:
                readLine = readLine + read
        else:
            break


runHotspot()

Disclaimer: I stripped this snippet from my working code and did not test the snippet itself.

avandermeer avatar Apr 21 '21 10:04 avandermeer

I think the simplest solution is to just add the lines:

//scan for new networks
let wifi_device = self.device.as_wifi_device().unwrap();
wifi_device.request_scan();
thread::sleep(Duration::from_secs(2));

The code would look like this:

impl NetworkCommandHandler {
    fn new(config: &Config, exit_tx: &Sender<ExitResult>) -> Result<Self> {
        let (network_tx, network_rx) = channel();

        Self::spawn_trap_exit_signals(exit_tx, network_tx.clone());

        let manager = NetworkManager::new();
        debug!("NetworkManager connection initialized");

        let device = find_device(&manager, &config.interface)?;
	//scan for new networks
	let wifi_device = device.as_wifi_device().unwrap();
	wifi_device.request_scan();
	thread::sleep(Duration::from_secs(2));

        let access_points = get_access_points(&device)?;

        let portal_connection = Some(create_portal(&device, config)?);

        let dnsmasq = start_dnsmasq(config, &device)?;

        let (server_tx, server_rx) = channel();

        Self::spawn_server(config, exit_tx, server_rx, network_tx.clone());

        Self::spawn_activity_timeout(config, network_tx.clone());

        let config = config.clone();
        let activated = false;

        Ok(NetworkCommandHandler {
            manager,
            device,
            access_points,
            portal_connection,
            config,
            dnsmasq,
            server_tx,
            network_rx,
            activated,
        })
    }

Like @avandermeer mentioned this is a problem of network manager. You can test this in the console with:

nmcli device wifi list
sudo nmcli device wifi rescan
nmcli device wifi list

First you should only see the network you are connected to currently. After the rescan all networks should be listed. I don't know how to code in rust. Therefore my solution is not the best. The documentation says that you should check the LastScan property to see if the function RequestScan() is finished. Maybe someone who knows rust better than me can implement this. I just added a sleep after the rescan. It should find all networks in 2 seconds. Maybe one would also be enough.

Problem: Networkmanager has a timeout of 10 seconds for the rescan. This means than you restart the program in under 10 seconds the program will raise an error. But I think someone who knows what he is doing can easily catch this error. Also it is very unlikely that someone wants to restart the program in this short time period. There are also ways to deactivate this timeout. The program won't crash and operates normally after the error occurred. This only means that rescan was not used at the start of the program.

Levios1 avatar May 18 '21 14:05 Levios1