wifi-connect
wifi-connect copied to clipboard
Captive portal does not work on BalenaOS 2.46.1 (Supervisor 10.x)
I had this project working nicely with my other projects, but recently the captive portal no longer comes up. When my phone connects to the wifi portal, it indicates "connected", then "obtaining ip address", then "no internet", but never brings up the captive portal. This was working a few weeks ago, trying to debug but can't figure out what has changed to stop this working.
Any hints how to debug this?
So it looks like wifi-connect does not work properly on Supervisor versions 10.x, whereas it works fine with Supervisor 9.x versions of BalenaOS.
@marclennox sorry for not responding earlier, had to catch up on a lot of other fronts. I am going to test this tomorrow. This does not sound good.
Thanks @majorz, I can definitely confirm that wifi-connect works fine on BalenaOS 2.38.0 (both raspberry Pi 3 and Balena Fin 1.0), but does not work on BalenaOS 2.46.1 (raspberry pi 3). The failure mode is that the captive portal just doesn't come up after connecting and obtaining an IP address.
That's bad. We test WiFi Connect before each balenaOS release as part of our stability tests, but probably this slipped through the cracks somehow. I will test this first thing tomorrow morning in a few hours.
Thanks @majorz, look forward to hearing what you find.
@marclennox I was not able to reproduce.
I followed minimal steps:
- Created a new empty application
wifi-connect
on the dashboard - Flashed a balenaOS 2.46.1+rev1 RPi 3 image (the default 32-bit version, not the 64-bit beta one)
- Cloned the WiFi Connect repo
- Logged-in with our CLI -
balena login
- From the root of the repo I did
balena push wifi-connect
to push the code to the newly created application - Waited for the image to be downloaded and started testing
I did numerous tests with both RPi 3 B and B+, but the captive portal always showed correctly.
Can you please repeat the above minimal steps and let me know how that goes for you? This would help in narrowing down the issue on your side.
I will. Note however that I'm using it in a multi- container application, with privileged: true
On Thu., Jan. 23, 2020, 06:35 Zahari Petkov, [email protected] wrote:
@marclennox https://github.com/marclennox I was not able to reproduce.
I followed minimal steps:
- Created a new empty application wifi-connect on the dashboard
- Flashed a balenaOS 2.46.1+rev1 RPi 3 image (the default 32-bit version, not the 64-bit beta one)
- Cloned the WiFi Connect repo
- Logged-in with our CLI - balena login
- From the root of the repo I did balena push wifi-connect to push the code to the newly created application
- Waited for the image to be downloaded and started testing
I did numerous tests with both RPi 3 B and B+, but the captive portal always showed correctly.
Can you please repeat the above minimal steps and let me know how that goes for you? This would help in narrowing down the issue on your side.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/balena-io/wifi-connect/issues/328?email_source=notifications&email_token=AAE7CZAQTFNHXM2VGYV3PITQ7F6HVA5CNFSM4KHXDHJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJXCLBY#issuecomment-577643911, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE7CZCZQ5X3MOOFSMFXTXTQ7F6HVANCNFSM4KHXDHJA .
Internally all single container applications are processed as multi-container ones - a docker-compose.yml exists with privileged: true, network: host, etc. So it is probably not related, but let's see where it starts to break.
OK @majorz, I figured out what the issue is.
In my wifi-connect Dockerfile, I've added network-manager
to the list of installed packages, in order that I can use the nmcli
command to check for an active network.
If I take the stock wifi-connect
project, it works fine for me on 2.46. If I simply add network-manager
to the package list, the captive portal no longer comes up on my phone after connecting to the Wifi Connect
SSID.
It should be noted that on 2.38, Wifi connect works properly regardless of having network-manager
added to the installed package list.
Thanks, I will try that. Just to reassure - are you using the newer balenalib
images, or the older resin
ones? As in FROM balenalib/%%RESIN_MACHINE_NAME%%-debian
.
FROM balenalib/%%BALENA_MACHINE_NAME%%-debian:latest
On Thu, 23 Jan 2020 at 10:10, Zahari Petkov [email protected] wrote:
Thanks, I will try that. Just to reassure - are you using the newer balenalib images, or the older resin ones? As in FROM balenalib/%%RESIN_MACHINE_NAME%%-debian.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/balena-io/wifi-connect/issues/328?email_source=notifications&email_token=AAE7CZEEFVOHWYHNPPWLQKLQ7GXPFA5CNFSM4KHXDHJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJXVX5Y#issuecomment-577723383, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE7CZHISSFG2VVSEYKTN7DQ7GXPFANCNFSM4KHXDHJA .
Same result if I use
FROM balenalib/%%RESIN_MACHINE_NAME%%-debian
Works fine without network-manager
, fails to bring up the captive portal with network-manager
installed
@marclennox I cannot reproduce that, it works for me. Also those should not be related as installed network-manager
in the container on the balenalib
base images should not have effect over wifi-connect
as it communicates with NetworkManager the service running on the host OS through D-Bus. It does not have any relation to the libraries installed by NetworkManager.
Very strange. It is 100% reproduceable for me. I'm using a multi-container deployment for my testing. Will try with single container just in case.
Is there a way I can turn on debugging logs to get more info from wifi-connect to see what's failing?
For multi-container make sure it has privileged: true and network_mode: host.
Yep it does
On Thu., Jan. 23, 2020, 12:39 Zahari Petkov, [email protected] wrote:
For multi-container make sure it has privileged: true and network: host.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/balena-io/wifi-connect/issues/328?email_source=notifications&email_token=AAE7CZAPFJR4TVNCN62YAC3Q7HI6JA5CNFSM4KHXDHJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJYF6CY#issuecomment-577789707, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE7CZBM4LEUOY5STPFDNO3Q7HI6JANCNFSM4KHXDHJA .
I see. Please go with the precise steps I provided above. Nothing more or less. And then you may start modifying from that state to see where it breaks, so that I can reproduce on my side as well. Unfortunately for the problem you describe there are no logs to be enabled currently. Usually logs can be retrieved from the host OS for NetworkManager with journalctl
, it will look fine on that side, since the problem occurs at later stage. Debugging the kind of problem you describe would require capturing packets with tcpdump and it will be rather hard for you. It will be best if I can reproduce on my side.
Well, I now was able to reproduce the issue without having network-manager package installed. Rebooting the device then made it work properly. In looking at the logs, I see the following logs when it works properly.
10.01.20 08:32:23 (-0500) main Starting WiFi Connect
10.01.20 08:32:23 (-0500) main Deleting already created by WiFi Connect access point connection profile: "WiFi Connect"
10.01.20 08:32:23 (-0500) main WiFi device: wlan0
10.01.20 08:32:24 (-0500) main Access points: ["HUAWEI-3991", "Home Guest", "Home", "Home Guest", "Home", "NETGEAR58", "Home", "Home Guest", "TrackYourAssets!", "Home Guest", "Home", "Home", "Home Guest", "Home Guest", "Home", "Home", "Home Guest"]
10.01.20 08:32:24 (-0500) main Starting access point...
23.01.20 13:54:20 (-0500) main Access point 'WiFi Connect' created
23.01.20 13:54:20 (-0500) main Starting HTTP server on 192.168.42.1:80
23.01.20 13:54:58 (-0500) main User connected to the captive portal
And the following logs when it doesn't work
23.01.20 13:57:57 (-0500) main Starting WiFi Connect
23.01.20 13:57:57 (-0500) main Deleting already created by WiFi Connect access point connection profile: "WiFi Connect"
23.01.20 13:57:57 (-0500) main WiFi device: wlan0
23.01.20 13:57:57 (-0500) main Access points: ["WiFi Connect"]
23.01.20 13:57:57 (-0500) main Starting access point...
23.01.20 13:58:00 (-0500) main Access point 'WiFi Connect' created
23.01.20 13:58:00 (-0500) main Starting HTTP server on 192.168.42.1:80
So it feels like this might be related to https://github.com/balena-io/wifi-connect/issues/327
It seems that the device gets "stuck" in a state where the portal is activated, so if the process restarts, it only sees the portal SSID, and that's when things go bad.
Adding the following before calling wifi-connect
seems to make the problem go away
nmcli connection down id "WiFi Connect" || true
nmcli connection delete id "WiFi Connect" || true
I see, the problem is when an already "WiFi Connect" profile exists, e.g. because of a power cycle. I will test this out.
Correct. I think what also might exacerbate the problem in my particular setup, is that I use the timeout
option, then (in a loop) restart wifi-connect.
I have been able to build a fairly bullet-proof script using nmcli that all but eliminates this problem for me, so for now I have a very viable workaround.
@marclennox have you added any more code to your script, or does just running the following before wifi-connect
fix the issue for you?
nmcli connection down id "WiFi Connect" || true
nmcli connection delete id "WiFi Connect" || true
@meech-ward I've made the script a little more robust (dealing with a possible failure of each nmcli call), but yes, that's basically all I'm doing before launching wifi-connect, and it has solved the issue for me.
Hi it seems like I having the same problem on this distribution balenaOS 2.44.0+rev3
I have clone this repo, built it for rpi3 with balena and deployed in a multi-container application with network_mode: host privileged: true
It generate the AP, but never open the portal, I am using for testing an rpi3 as hardware, a xiaomi and a macBook Pro for the portal.