[Bug]: MQTT Client Proxy cannot be enabled in firmware 2.7.15 - proxy forced on when MQTT enabled - nRF52-based boards - 4631
Category
Other
Hardware
Rak4631
Is this bug report about any UI component firmware like InkHUD or Meshtatic UI (MUI)?
- [ ] Meshtastic UI aka MUI colorTFT
- [ ] InkHUD ePaper
- [ ] OLED slide UI on any display
Firmware Version
2.7.15
Description
In firmware 2.7.15 (and alpha 2.7.16), the MQTT Client Proxy setting is forced on whenever MQTT is enabled.
Hardware:
- RAK19007 + RAK4631 + RAK13800 (connected to the LAN, gets internet. Bluetooth deactivated, but it doesn't matter I tested with and without)
Enabling MQTT without proxy
- Start with MQTT disabled
- Enable MQTT module without enabling Client Proxy
- Save configuration
- Check settings
MQTT becomes disabled again automatically
*Disabling proxy when MQTT (and proxy) are enabled
- Start with MQTT already enabled (proxy is on)
- Attempt to disable "Client Proxy" through any interface:
Mobile app Web client CLI: meshtastic --set mqtt.proxy_to_client_enabled false (by connecting to the node's ip, by serial too...)
- Save configuration- Check settings
Proxy remains enabled, cannot be toggled off
I specifically got the one with wifi, and the other one with the ethernet module, so they do their thing with MQTT (I planned to set up a private MQTT broker for ourselves), and connect to them from the "network" tab of the app to send and receive messages.
The devices can be unattended ethernet gateway, ... or we can connect to it to send and check messages.
PS. I searched for a similar bug, but I couldn't find it so I hope it's not a duplicate.
Key points
- TLS doesn't work reliably on nRF52 RAK4631,, etc...
- Proxy client loops every second with constant reconnect when using TLS
- Private encrypted channels don't uplink without TLS
- ESP32 boards (seems to) work fine with TLS (ESP32-S3 confirmed working, but this little board has other problem, and isn't stable for other things, like wifi).
Relevant log output
My hunch is that you aren't running the "eth_gw" firmware - if you have your radio plugged into a computer over serial, run either the serial monitor or a Meshtastic --noproto and I bet you'll see the line "Invalid MQTT config: proxy_to_client_enabled must be enabled on nodes that do not have a network".
Try flashing again with 2.7.15 and the Eth_GW firmware and see if that fixes it.
Edit:
I believe this was simply a connectivity issue caused by a small error in a firewall rule.
Our nodes are on an isolated iot network with no internet access and can't access other networks either. We had a firewall rule that was intended to allow only the Meshtastic devices to reach mqtt.meshtastic.org.
That rule had a little mistake, which has now been corrected. Src ports and dest ports where inverted.
If this is indeed what was happening, we should improve the user experience and when saving the mqtt settings, it should verify that it can resolve the DNS and actually connect to the server. If it cannot, it should display a clear error message instead of silently toggle the setting to "Off" (when the proxy option is not selected).
If it switches back to off, I guess it knows it can't connect (right?). So an informative error message to the user would be very useful.
It is difficult for me to be sure this is exactly what's happening because my testing capabilities are limited and there is so many variables.
If any stumble upon this and have the same problem, check if the node can access the server, the dns, and/or if your home network has port blocked. Maybe just try with port forwarding.
It's difficult for me to be 100% certain that this is exactly what's happening since my ability to test is limited and there are many possible variables involved. If anyone else runs into the same issue and sees this message check your firewall rules, check if your home network has proper ports forwarded, check if your node can access the internet.
Last thing, I'm not 100% confident about my findings, and maybe in 5 mins I'm gonna go crazy with the settings not sticking again lol. Without proper error messages.
Can you confirm were you able to save the config once it was able to validate proper connectivity?
Oh man, no, I can't :/
It was working for a bit, so I figured ok I can set up this private MQTT server now lol
Didn't work. Wiped everything off the node, reinstalled it all from scratch, tried doing things in a specific order (even cleared all the cache and data on the phone). Couldn't get anything to set up properly from the CLI or web client
So I started turning settings on one by one directly from the phone app.
At one point, everything magically I could see messages flowing through my mqtt server, from the node that had the connection to the mqtt server, from the other nodes with uplink and downlink activated (and OK to MQTT) for each of our private channels.
Now I can't turn the proxy off again lol. It's stuck looping like crazy in the MQTT server logs, and sometimes it throws "out of memory" errors.
I have no idea how to debug this I really should've left it alone when it was working for those 5 minutes before I started to touche something, all my nodes were happily routing messages through MQTT just fine.
And yeah, I think it does the same thing even with the public server (meshtastic.org), but we can't know because there is no error messages, and we can't see the logs on their side.
depending on what settings you toggle and in what order, you can somehow end up in a state where you literally can't turn the proxy off anymore. And I'm sure it's the internet access again, but the subnetwork on which the node is (should be) totally opened.
But the fact remains after all of this that, in some instance, because of some settings set in some order, we then can't just toggle off the proxy. I think it should be manager better, whatever someone's configuration is.
Below how it looks like when the proxy loops on the server side (something we can't see from the app, so it could go forever, and nobody would know). It's just a sample and disconnected the phone.
1764884340: Client MeshtasticAndroidMqttProxy-!c2xxxxxx disconnected due to out of memory. 1764884341: New connection from xx.xxx.xx.xxx:48686 on port 8883. 1764884341: New client connected from xx.xxx.xx.xxx:48686 as MeshtasticAndroidMqttProxy-!c2xxxxxx (p2, c1, k60, u'user_xxx'). 1764884341: Client MeshtasticAndroidMqttProxy-!c2xxxxxx closed its connection. 1764884342: New connection from xx.xxx.xx.xxx:48688 on port 8883. 1764884342: New client connected from xx.xxx.xx.xxx:48688 as MeshtasticAndroidMqttProxy-!c2xxxxxx (p2, c1, k60, u'user_xxx'). 1764884342: Client MeshtasticAndroidMqttProxy-!c2xxxxxx closed its connection. 1764884343: New connection from xx.xxx.xx.xxx:48692 on port 8883. 1764884344: New client connected from xx.xxx.xx.xxx:48692 as MeshtasticAndroidMqttProxy-!c2xxxxxx (p2, c1, k60, u'user_xxx'). 1764884344: Client MeshtasticAndroidMqttProxy-!c2xxxxxx closed its connection. 1764884345: New connection from xx.xxx.xx.xxx:48694 on port 8883. 1764884345: New client connected from xx.xxx.xx.xxx:48694 as MeshtasticAndroidMqttProxy-!c2xxxxxx (p2, c1, k60, u'user_xxx'). 1764884345: OpenSSL Error[0]: error:80000068:system library::Connection reset by peer 1764884345: Client MeshtasticAndroidMqttProxy-!c2xxxxxx disconnected due to out of memory. 1764884346: New connection from xx.xxx.xx.xxx:48696 on port 8883. 1764884347: New client connected from xx.xxx.xx.xxx:48696 as MeshtasticAndroidMqttProxy-!c2xxxxxx (p2, c1, k60, u'user_xxx'). 1764884347: Client MeshtasticAndroidMqttProxy-!c2xxxxxx closed its connection. 1764884350: New connection from xx.xxx.xx.xxx:48700 on port 8883. 1764884350: New client connected from xx.xxx.xx.xxx:48700 as MeshtasticAndroidMqttProxy-!c2xxxxxx (p2, c1, k60, u'user_xxx'). 1764884350: Client MeshtasticAndroidMqttProxy-!c2xxxxxx closed its connection. 1764884351: New connection from xx.xxx.xx.xxx:48702 on port 8883. 1764884351: New client connected from xx.xxx.xx.xxx:48702 as MeshtasticAndroidMqttProxy-!c2xxxxxx (p2, c1, k60, u'user_xxx'). 1764884351: Client MeshtasticAndroidMqttProxy-!c2xxxxxx closed its connection. 1764884353: New connection from xx.xxx.xx.xxx:48704 on port 8883. 1764884353: New client connected from xx.xxx.xx.xxx:48704 as MeshtasticAndroidMqttProxy-!c2xxxxxx (p2, c1, k60, u'user_xxx').
I configured the mqtt server to accept non tls connections just to see if things would work. It does work (just totally insecure), but the proxy on mobile devices, like a smartphone, still goes into the same connect/disconnect loop every second.
So after testing, here's what I've concluded:
-
TLS is buggy. I can't get it to stay active no matter what I try. Local network with no firewall, mobile ISP, whatever, same result. I managed to get it working once for a short moment. I don't remember which firmware I flashed (i started from scratch) and in which order I configured everything. Spent so long, I which I did. Note that the it's bridging with some channels on the public mqtt server At that time I only had port 8883 and TLS enabled in the MQTT config. Now I've added 1883 which isn't ideal, but for now it's just me and some friends testing things to see if we can deploy it for oursevles
-
Without tls a Meshtastic device using the proxy option and tethered to a phone still gets stuck in that same loop. I tested with the phone on the same Wifi and also with mobile data (so not on the same network), same behavior.
That's my observation. Is it the firmware, the app on the phone, I don't know. The logs of the server says MeshtasticAndroidMqttProxy-!xxxxxxxx - close its connection, so I guess it's not mosquitto's server fault either (the mqtt server is mosquitto). Or could it be?!
This is super frustrating
Ok, so the problem is definitely with the RAK19007 + RAK4631 + RAK13800 setup. The ESP32-S3 seems to handle TLS way better than the other hardware.
I finally figured out why I thought TLS was working earlier. I've tested so many combos that I got mixed up—one of those tests was on the Seeed Studio xiao ESP32-S3 + Wio-SX1262, and on that one TLS actually does work. I connected it, toggled TLS, and everything behaved normally. - And even saying thing I can't confirm it's 100% this, but it's running, i didn't touch anything
I spent so much time not knowing it was just that, since I powered the xiao once, and then just unplugeg it and test everything else (I should have look at the node id in the log.. anyway)
I only see one error in the MQTT server logs: OpenSSL Error[0]: error:0A000126:SSL routines::unexpected eof while reading
But anyway, the original problem is still there and the whole point of my project (and I'm guessing others too) is to have a solid fixed node, bridged private mqtt server (it's so practical). The RAK19007 with its Ethernet module is basically the perfect modular setup for that.
Except it really doesn't like TLS.
It'd be amazing if this could be fixed. If it can be fixed, and if it's just a matter of code.
Here's what I've discovered now:
Messages on a private channel with a 256bit keys don't get sent if TLS is off. When checking server logs (listening on msh/EU_868/2/e/PRIVATEEXAMPLE/# with -v), nothing shows up.
When TLS is on (on the XIAO), encrypted messages appear in the logs 😆
Ironically the XIAO ESP32-S3 and this Wio-SX1262 was just a little dev toy I grabbed to play with and wasn't meant to do anything meaningful.... and now it turns out to be the only one in my setup that actually handles TLS properly. I guess other esp32-base device would work, but we have all RAK now, because it's so much more power efficient :/
Please, if any dev comes across this post, can it be fixed? or is it just something that the hardware will never be able to handle???
Especially because os many portable devices in the field are RAK stuff, and they do bug-loop with tls and proxy from the phone to get to connect to the server.
EDIT: Is it why then we can't use TLS on the public mqtt server??! That would make sense, since so many boards can't handle tls