esp-aws-expresslink-eval icon indicating copy to clipboard operation
esp-aws-expresslink-eval copied to clipboard

Just stops after a while

Open pdbayes opened this issue 2 years ago • 20 comments

Hi After thinking I had this sorted regarding issue #7, It is still not working correcty. It works and I can se the hello world message on the AWS IOT MQTT Test client, but it just stops, for no apparent reason after some varied amount of time. I cannot se the reason for this and don't know how to get to the bottom of it. I have a suspision it is something to do with the SUART as opening the Arduino serial logger seems to bring it back to life. I have tried it with a BME690 sensor that prints loads to the monitor and that seems to only manage one connection and then it just doesn't connect, even though it keeps trying. I am using an UNO board that only has one SUART but it should be able to cope with this, is look more like the Espresif board is not doing what it is supposed to after a while. Any ideas, this is becoming a real pin.

pdbayes avatar Aug 23 '22 11:08 pdbayes

Should there be a break on line 156? Wh Why does it try and send data on line 164 if it knows it is not connected?

pdbayes avatar Aug 23 '22 12:08 pdbayes

@pdbayes Not sure if I understand your issue completely, but we do agree that this sketch has its own flaws around state transitions. This has been identified and fixed internally. We are running some tests on it and hoping to get it merged on GH as soon as we can, hopefully in next couple of days.

avsheth avatar Aug 23 '22 14:08 avsheth

Hi, my issue is that it initially connects and i can see hello world starting to appear every 10s or so on AWS IOT Test client. Then after a while, it just stops and no messages get to AWS and it never regains a connection unless the UNO is reset.

pdbayes avatar Aug 23 '22 14:08 pdbayes

Hi, I ran the test sketch with the Arduino serial looger and a TTL to USB converter and collected the logs. It went through the loop with no issues about 1000 times and then this happened: OK 1 CONNECTED OK 1 CONNECTED ERR14 2 UNABLE TO CONNECT Failed to access network OK 2 0 STARTUP OK 1 CONNECTED ERR8 PARAMETER UNDEFINED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED OK 1 CONNECTED ERR8 PARAMETER UNDEFINED

I think the parameter is the topic, it seems to have lost the reference to the topic1 when it failed to access the network. Perhaps on conlost it needs the state to go back to STATE_EL_READY instead of PROVISIONED?

pdbayes avatar Aug 24 '22 07:08 pdbayes

So, by redefining the topic at various states, it's now been running for days with a bme680 and no issues. Is this a firmware issue as surely the topic shouldn't be deleted if there is a disconnect?

pdbayes avatar Aug 29 '22 07:08 pdbayes

Hi. Now stopped after almost a week. I definitely think the state machine in the firmware isn't quite right. Is there any progress on solving this.

pdbayes avatar Sep 01 '22 20:09 pdbayes

Just want to confirm is this with the latest sketch we updated around couple of weeks ago?

avsheth avatar Sep 07 '22 06:09 avsheth

So, I tried the new sketch and it worked at first, we then had a power outage and I have not managed to get it working again since. If i do the commands manually it's fine. I think it may be a timeout issue on connecting, I have a netgear mesh wifi system and the router is always up and running before any satellites. Devices tend to then connect to the router as it's first up even when there is better signal coming online slightly later. I think there needs to be a loop retrying the connection on a provisioned device, or it needs to wait for a response and act accordingly. The reason I moved over to this device was that I have an ESP32 DEV kit v4 that works but it randomly loses it's connection to AWS and then gets stuck in a loop and has to be reset. it's really annoying that I can't get something stable working. I used to use Partlicle Photons but they are expensive and they have problems with my home network (it has 2 routers and is double NAT'd and they don't seem to like that), but they used to work flawlessly for years.

pdbayes avatar Sep 07 '22 07:09 pdbayes

OK, so I missed the setTimeout and can see that it will wait 30 seconds for a response from the Connect message. Does the board have a way of checking it is actually online though?

pdbayes avatar Sep 08 '22 14:09 pdbayes

It is also possible that as the device only operates in the 2.4ghz band the mesh router will prefer the 5ghz band and the device tries to connect to that, there seems to be a lot of issues with smart devices and routers/ap's with 5ghz. You can't have different SSIDs for each frequency so cant choose. I might see if the guest network can be limited to 2.4ghz and use that

pdbayes avatar Sep 08 '22 14:09 pdbayes

Worked OK for a while on a 2.4Ghz only AP but still randomly just stops sending messages. This is unusable and is wasting a lot of my time and was a waste of money. It is supposed to make things easier, not harder.

pdbayes avatar Sep 21 '22 11:09 pdbayes

Hi @pdbayes Sorry about not getting back earlier. Give us some time. We have kept a device running for long duration test. Will get back as soon as I can. Can you btw just let know if any time during the test run, either internet or wifi went off ? It would be hard to know about internet, so if you happen to have the ExpressLink logs, could you share them ?

avsheth avatar Sep 22 '22 11:09 avsheth

It's possible the internet went off but I don't know. How do you access expresslink logs?

pdbayes avatar Sep 22 '22 14:09 pdbayes

Hi @pdbayes, You can access ExpressLink logs from UART0 i.e. the microUSB connector. You need to simultaneously open two consoles, one where you will give the AT commands and the other where you can see the ExpressLink logs.

Please refer to Section 6a of the README and this discussion for more info.

dhavalgujar avatar Sep 22 '22 14:09 dhavalgujar

So are you saying there would be a stored log on the board? Or do I have to have it running, connected to a pc collecting logs until it stops working?

pdbayes avatar Sep 22 '22 19:09 pdbayes

There is no provision to store logs on the board, you will need to have it connected to a PC until it stops working.

Also, ExpressLink generates a CONLOST event if there is a network-related problem and it requires the host to explicitly issue the AT+CONNECT command again.

dhavalgujar avatar Sep 22 '22 19:09 dhavalgujar

usb_log.txt This is the log from all the loops from just after trying to send to just after trying to send, it is not getting any messages to AWS.

pdbayes avatar Sep 22 '22 19:09 pdbayes

after_power_cycle.txt After a power down and up cycle it works OK and here is the log

pdbayes avatar Sep 22 '22 19:09 pdbayes

Noted, thanks a lot for the logs!

We have improved the handling of network disruptions in the next release, it will fix the issue that you are seeing and cleanly disconnect (do a complete Wi-Fi disconnect) when there is a network-related issue. However, as I mentioned before, the host will have to explicitly issue AT+CONNECT again.

The release will be made available shortly, I will let you know here as well.

dhavalgujar avatar Sep 22 '22 20:09 dhavalgujar

The AT+CONNECT would be handled by using the states in the sketch?

pdbayes avatar Sep 22 '22 20:09 pdbayes