airbnk_mqtt icon indicating copy to clipboard operation
airbnk_mqtt copied to clipboard

Stability improvements

Open rospogrigio opened this issue 3 years ago • 164 comments

Let's use this post to discuss ways to improve the stability of the connection: use a different firmware, rebuild tasmota, etc.

rospogrigio avatar Nov 26 '21 14:11 rospogrigio

Reposting this here as I think it better fits this repo than the other one:

Has anyone identified the chip on the physical device? I have the M300 Bluetooth version and the chip has a similar pinout to an ESP8266, but I'm not good enough at reverse engineering to be sure. The top of the chip is etched away just enough to make it impossible to make out the original number; at the right angles you can make out some lines, but not enough to form full letters.

Adrian-at-CrimsonAzure avatar Dec 06 '21 02:12 Adrian-at-CrimsonAzure

Reposting this here as I think it better fits this repo than the other one:

Has anyone identified the chip on the physical device? I have the M300 Bluetooth version and the chip has a similar pinout to an ESP8266, but I'm not good enough at reverse engineering to be sure. The top of the chip is etched away just enough to make it impossible to make out the original number; at the right angles you can make out some lines, but not enough to form full letters.

I don't believe someone here was tearing device down I got same lock. However, ESP8266 doesn't have BT, afaik. Instead there is WiFi. Do you think there's ESP32 inside as well?

formatBCE avatar Dec 06 '21 02:12 formatBCE

Main board of M300: Top Bottom

There is a sub-board I was too lazy to get to that all the connectors go to (except bottom right, which goes to the DC motor) but that board seems to just connect the batteries and provide some pre-regulation for the mainboard. The little unpopulated header seems to connect to power and three other pins, probably RX/TX/GPIO0 like most Tuya products. Haven't hooked up anything to it yet, might finally be my excuse to get a logic analyzer...

Completely slipped my mind that the ESP8266 doesn't have Bluetooth, I've been working with the ESP32 too much lately. This has an 8x8 layout where as the ESP32 has a 16x16, so no chance of it being an ESP.

Adrian-at-CrimsonAzure avatar Dec 06 '21 02:12 Adrian-at-CrimsonAzure

@rospogrigio today I had spare time to dig into ESP32 BLE stack. I managed to post advertisement to MQTT, it works very well. Also, the topics to use will be more obvious, no need to create rules, and I believe we will get possibility for integration to send both messages at once, instead of waiting for first chunk to write. It will get your code much shorter and easier to maintain.

However, now I'm stuck with sending data to lock - for some reason, it does stuck, when ESP32 tries to initiate BLE client for connection to characteristic. Hopefully, I will find the solution for this. Other things are working already. (Well, I will have to find a way for configuring ESP on-the-fly also, but it's just matter of writing, I believe).

formatBCE avatar Dec 07 '21 02:12 formatBCE

@rospogrigio So for now, I got the way to send commands and scan at same time. However, sending (basically, connecting to lock) is giving me hard times.

I'm getting this error: lld_pdu_get_tx_flush_nb HCI packet count mismatch (0, 1)

It happens on connection, and what's the worse, is that after that ESP becomes unresponsive, I cannot even reconnect with serial monitor. Could be that lock is too far (it is actually far), or power is not enough (it's PC USB all-in-all), but it should at least reboot, but it just hangs.

Didn't find any reliable information on this yet. Gonna keep digging. Any help appreciated.

formatBCE avatar Dec 07 '21 08:12 formatBCE

Sorry but I really don't have any expertise on this, wish I could help... As a side note, I'm working on adding a configurable option to set a desired number of automatic retries in case of a FAILCONNECT event, should be ready very soon. Keep on digging, I'm sure you can make it!

rospogrigio avatar Dec 07 '21 08:12 rospogrigio

So what I have now:

  • managed to connect to the lock, and get to service/characteristic
  • changed @rospogrigio's integration to work with new topics
  • removed unnecessary code from integration, as now we can send both data chunks in one JSON
  • managed to bring lock alive and make it sending lock/unlock commands to MQTT
  • tried to write that two chunks consequently to characteristic

Here I stuck. It says "true", basically reports success from write operation. But lock doesn't respond to commands. Either data format is incorrect (I'm sending from *ptr, so it might be the problem, or maybe I have to convert that from string before sending, don't know), or write operation lies to me (unlikely, 'cause characteristic value seems to be changing). If someone has any clues, what to do or how to debug, welcome.

Next steps (after getting it working, of course):

  • unify/rearrange topics system
  • make use of telemetry messages (might be useful for checking gateway status)
  • make UI for initial setup
  • think on removing some parameters, that were used because of Tasmota general purpose, but unnecessary now with strict gateway binding
  • think on retries (it's easier to do on gateway, than in integration)

To the last point: gateway is REALLY powerful and stable. It reboots itself in the matter of seconds in case of failure, and can connect to lock (almost) without failures from like 7-8 meters and 1 wall. That's what I was expecting.

formatBCE avatar Dec 09 '21 04:12 formatBCE

Yes, I believe, writing string itself was strange solution :)

Will try to convert it back to bytes (I believe, there's 20 bytes in each of two commands), and send that.

Also, I have some doubts on current integration status determining logic. Although mqtt success received, it's still in operating status. That's fine for now, as it doesn't work, but I guess there should be a way to integrate closer, instead of waiting for changed adv message.

formatBCE avatar Dec 09 '21 07:12 formatBCE

I can help you with this: the FFF3 characteristic (you can read it, right? or at least you can use nRF Connect for this) has some status bytes (byte 5, in detail) that provide a error code in case of failure. This is how I understood that I was sending the wrong payloads, too. Look at this function:

    public static String resultString(byte[] bArr2) {
        String str = "";
        if (bArr2[0] == -86) {
            switch (bArr2[5]) {
                case 0:
                    str = "Success";
                    if (bArr2[3] == 2) {
                        if (bArr2[4] != 1) {
                            if (bArr2[4] == 2) {
                                str = str + ("  Device time:" + ((long) PackMaker.byte4ToInt(bArr2, 6)));
                                break;
                            } else {
                                byte b = bArr2[4];
                                break;
                            }
                        } else {
                            str = str + ("  Service time:" + ((long) PackMaker.byte4ToInt(bArr2, 6)));
                            break;
                        }
                    }
                    break;
                case 1:
                    str = "Fail";
                    break;
                case 2:
                    str = "Invalid role type";
                    break;
                case 3:
                    str = "Invalid operation type";
                    break;
                case 4:
                    str = "Invalid opcode";
                    break;
                case 5:
                    str = "No operation authority";
                    break;
                case 6:
                    str = "Invalid signature";
                    break;
                case 7:
                    str = "Serial number expired";
                    break;
                case 8:
                    str = "Out of check-in time";
                    break;
                case 9:
                    str = "Service time expired";
                    break;
                case 10:
                    str = "Locked, cannot open the door";
                    break;
                case 11:
                    str = "Not initialized";
                    break;
                default:
                    str = "";
                    break;
            }
        }
        String str2 = "" + ((int) bArr2[5]);
        return str;
    }

Hope this might help, bye...

rospogrigio avatar Dec 09 '21 08:12 rospogrigio

To the last point: gateway is REALLY powerful and stable. It reboots itself in the matter of seconds in case of failure, and can connect to lock (almost) without failures from like 7-8 meters and 1 wall. That's what I was expecting.

This is REALLY impressive!! Can't wait to see it in action!! 😮

rospogrigio avatar Dec 09 '21 09:12 rospogrigio

I MADE IT!

Opening/closing works. Status update is not so well yet, there's gap to fill. @rospogrigio here's the situation, please give me advice:

  • scan for devices (which is actually getting lock adv) has scan time and wait time. Basically, it's active scanning window, and interval between active scans;
  • the bigger active scan is, the bigger is probability to get adv data from lock. Now i'm scanning for 4 seconds, and waiting for 1 second in between. Works fairly good.
  • however, when in active scan, i cannot send commands. ESP has to wait for scan end to connect to device. So maximum gap between getting the command and writing it can be up to 4 seconds now.
  • moreover, after sending i'm launching scan again. And first adv from lock will be ready after next 4 seconds (or more, if it won't find lock on first scan, which can be affected with distance to lock, for example).
  • also, integration doesn't seem to have good support for my topics here - i don't know what to do with status, while we're waiting for lock adv (although we already got WRITE SUCCESS from gateway. I believe, we can trust that success message (there will be an error for every trouble posted).

So what i want you to think:

  • with this gateway, your integration can theoretically get any data you need. Also some calculations, i believe, could be done on gateway too. Let's think, what you want to change in this interaction.
  • based on this - can we think on better status support in integration? it's a bit clunky now, which, i believe, was because of restrictions of Tasmota BLE stack. Now we can optimize that.
  • what can we do for scan gap? I could try to decrease scan time right after command sending, i guess. But it won't eliminate that 4-second gap, if scan just started and we got the command to run. Are we ok with this gap?
  • maybe, you have some other brilliant ideas to help improving this.

Meanwhile, i will focus on initial gateway configuration.

formatBCE avatar Dec 09 '21 18:12 formatBCE

New updates: I managed to make scanning stop right on command received. Now it responds to commands almost instantly.

formatBCE avatar Dec 09 '21 20:12 formatBCE

Cool, congratulations! So, my few first thoughts after reading this:

  • 4 secs seems quite much, I believe we can find a way to have a bit more responsivity
  • we actually don't need to scan for the advert: all the info we need from there (lock status, lock events and battery) can be read from FFF3 characteristic
  • I would not do calculations on the gateway, it's easier to debug if we do them in the integration

rospogrigio avatar Dec 09 '21 20:12 rospogrigio

Oops, I just read your last message... Great!! When can you share your code and firmware so I can try it?

rospogrigio avatar Dec 09 '21 20:12 rospogrigio

I guess it's about the time. Don't want to get interrupted by something and leave you without anything in hands :)

Let me create PR for your integration, and repo for gateway. Actually, without Arduino IDE you won't be able to run it - all stuff like topics, WiFi parameters and MQTT parameters are hard-coded now. I'm trying to get them into some initial setup. But at least we will have some code in the cloud

formatBCE avatar Dec 09 '21 20:12 formatBCE

Cool! Regarding whether to get the values from advert or FFF3, I seem to remember that nourmehdi wrote that FFF3 was more power consuming, so we'd probably better use the advert since you found a way to interrupt it on command. Can't wait to see the code! 😉

rospogrigio avatar Dec 09 '21 20:12 rospogrigio

Here's PR: https://github.com/rospogrigio/airbnk_mqtt/pull/7 If you prefer another branch, just let me know.

formatBCE avatar Dec 09 '21 20:12 formatBCE

And here https://github.com/formatBCE/Airbnk-MQTTOpenGateway is the repository. Change things in Settings.h, build and upload. I use VS code with Platformio plugin.

formatBCE avatar Dec 09 '21 21:12 formatBCE

Super super cool! Tomorrow I'll try to find some time to play with it. Could it be possible to set the options in some other way instead of rebuilding it maybe?

rospogrigio avatar Dec 09 '21 21:12 rospogrigio

Could it be possible to set the options in some other way instead of rebuilding it maybe?

Yes, i'm thinking on making WiFi access point with simple web page for initial setup, and when submitted, it will save prefs, reboot and connect to actual WiFi. I never did it before, so it will take some time. But it's doable.

formatBCE avatar Dec 09 '21 21:12 formatBCE

@rospogrigio Ok, i uploaded new version to https://github.com/formatBCE/Airbnk-MQTTOpenGateway You may find built binary for esp32 in repo root (firmware.bin).

Flash it, it will start access point AirbnkOpenGateway. Connect to that AP, and go to 192.168.4.1 in browser.

Fill-in the data there. It will create configuration, and reboot ESP32 to connect to your WiFi. Chip will indicate (most probably) with blue LED, when done. Also you may find messages in MQTT explorer. IP will be there, in "tele" subtopic.

Check it out, and tell me what's there.

If you screw up with config, and it won't connect - after some attempts it will reset config and re-deploy AP. Same config reset you can perform from web UI, by navigating to ESP IP address, given by your router.

Cheers.

formatBCE avatar Dec 10 '21 01:12 formatBCE

Ok, I guess tomorrow I will change behavior a bit. Because rebooting my HA host (with mqtt add-on) was enough to bring gateway to reset :) Will make it more patient to missing mqtt connection.

formatBCE avatar Dec 10 '21 03:12 formatBCE

Did it, now it will erase prefs only on wifi disconnect.

formatBCE avatar Dec 10 '21 07:12 formatBCE

Ok I have some ideas for the code merging, I'll take care of that so you can concentrate in improving the firmware if you believe there's more work to do there. Good job! 👍

rospogrigio avatar Dec 10 '21 08:12 rospogrigio

@formatBCE how do I flash the firmware? Can I use Tasmota interface, as far as you know? Edit: I also merged your code, creating a separate class for the new gateway but still keeping the older Tasmota device. I also moved the code generation in a dedicated class. Please see PR #9 , I'd suggest to work from there from now on. Let me know!

rospogrigio avatar Dec 10 '21 14:12 rospogrigio

OK, flashed, configured and launched HA but all entities are unavailable and I see nothing happening... I might have broken something, how can I debug the communication?? Edit: OK fixed almost everything, I have also managed to operate the lock once, even though it took more than 5 secs to operate. Moreover, after I operate it all entities become unavailable and I no longer receive messages so I have to reset the ESP, is it normal?

rospogrigio avatar Dec 10 '21 16:12 rospogrigio

Hi!

Nope, it's not normal. For me it works all the time - yesterday i tried like 50 times opening and closing, and got no errors. I implemented retry inside of the code, so it tries for 4 times to connect and send. It helped to reduce denial cases almost to zero. You may debug in different ways. First - what you see in logs to your integration? they should have some info. Also, you may download MQTT Explorer and check messages receiving in your root topic there to see exact messages. Also, you can connect ESP to your PC USB, and use Putty to connect to serial port, on which ESP is on (baud rate 115200) and check logs from ESP itself.

I just woke, and realized that my ESP disconnected from WiFi 1 hour ago. I think i will make some remote logging to trace the problem. WiFi/Mqtt part for this project i took from another sketch, so maybe worth optimizing.

Thank you for merge, gonna try that today.

formatBCE avatar Dec 10 '21 17:12 formatBCE

Ok I did some tests and I confirm it works like 100% of the times, and from a bigger distance (like 3-4m, inside a wooden cabinet). With Tasmota I was having 50% of failures from 1m in open air... Awesome job!!! I still have some strange status changes so I agree it's improvable. Also, I had set everything up with Tasmota to allow a gateway to connect to multiple locks (ok, I know it's a very unlikely event so I guess we can keep it like this). I'll test it for some days and will provide suggestions. First one: maybe we can pass the MAC address and the number of retries as parameters from the integration? Think about it and let me know your opinion. Thank you so much, I'm so proud of our achievement!!

rospogrigio avatar Dec 10 '21 17:12 rospogrigio

Thank you for kind words :)

Yes it's doable (both multiple locks and sending parameters over mqtt). If we really need this, i will implement :)

Downside to having multiple locks will be following:

  • longer scan (now i'm restarting scan session right after we got adv from lock, to have it working faster. We'll have to wait for all locks - and, probably, even won't get some of them, so will be just waiting till scan timeout)
  • longer operations time (first we'll need to wait longer on scan, then if there's already operation ongoing for one lock, we will have to wait again)
  • a bit sophisticated logic of waiting/releasing between operations and scanning.

formatBCE avatar Dec 10 '21 20:12 formatBCE

I improved stability and reconnection flow for gateway, and optimized usage of resources a bit. Check out my newest commit.

Today my main gateway worked for whole day with current stability changes, no drops. Also, on test gateway i tried to drop it forcefully, it was remaining stable.

We'll see in long run, but i'd consider it as RC.

formatBCE avatar Dec 11 '21 01:12 formatBCE