NimBLE-Arduino icon indicating copy to clipboard operation
NimBLE-Arduino copied to clipboard

Bonded clients do not work when switching from 1.4.2 to 2.0.0 or vice versa

Open KlausMu opened this issue 1 year ago • 13 comments

I am using NimBLE with an ESP32. It seems the way bonded clients are stored in NVS changed from NimBLE 1.4.2 to 2.0.0.

Going from 1.4.2 to 2.0.0 In 2.0.0, I can connect to an already bonded peer (which was bonded in 1.4.2). Even direct advertisement works (see https://github.com/h2zero/NimBLE-Arduino/issues/651) But when sending data, the result is

D NimBLECharacteristic: >> sendValue
D NimBLECharacteristic: << sendValue: No clients subscribed.
D NimBLECharacteristic: >> sendValue
D NimBLECharacteristic: << sendValue: No clients subscribed.

Normally, in version 1.4.2, when sending data, there is something like

D NimBLECharacteristic: >> setValue: length=8, data=0000510000000000, characteristic UUID=0x2a4d
D NimBLECharacteristic: << setValue
D NimBLECharacteristic: >> notify: length: 8
D NimBLECharacteristicCallbacks: onNotify: default
D NimBLEServer: >> handleGapEvent:
D NimBLECharacteristicCallbacks: onStatus: default
D NimBLECharacteristic: << notify
D NimBLECharacteristic: >> setValue: length=8, data=0000000000000000, characteristic UUID=0x2a4d
D NimBLECharacteristic: << setValue
D NimBLECharacteristic: >> notify: length: 8
D NimBLECharacteristicCallbacks: onNotify: default
D NimBLEServer: >> handleGapEvent:
D NimBLECharacteristicCallbacks: onStatus: default
D NimBLECharacteristic: << notify

As a result, I have to delete all bonds in 2.0.0 and to repair them. From that on, everything works as expected.

Going from 2.0.0 to 1.4.2 When going back from 2.0.0 to 1.4.2, the software crashes at

void NimBLEDevice::init(const std::string &deviceName) {
  ble_store_config_init();

If there was no bonded peer in 2.0.0, going back to 1.4.2 works without problem.

KlausMu avatar Nov 02 '24 09:11 KlausMu

It appears this may be a result of upstream changes in the way the data is stored in NVS, I haven't identified the changes yet but it may just have to be another part of the breaking changes coming with 2.0.0 release.

h2zero avatar Nov 03 '24 00:11 h2zero

@KlausMu I have identified the cause of this issue and it was an intended change, that is also a good one. The commit here changes the IRK when freshly flashed and booted to a random one so that all the esp32 devices have different ones rather than the same, which was a problem when trying to bond 1 device with many esp32's.

h2zero avatar Dec 03 '24 18:12 h2zero

Ok, thanks. Do I understand correctly:

Going from 1.4.2 to 2.0.0 ESP32 is freshly flashed and booted, IRK changes, already bonded devices do not work because of the changed IRK. As an application developer I could store the last used version (1.4.2 or 2.0.0) in NVS and in that case delete the bonds or at least let the user do it.

Going from 2.0.0 back to 1.4.2 What is happening to the random IRK generated when 2.0.0 is used for the first time? Will it be set back to the default one? Why is the ESP32 crashing on boot? Anything we can do to avoid this? Is the only advice to erase the flash?

KlausMu avatar Dec 03 '24 20:12 KlausMu

Going from 1.4.2 to 2.0.0 ESP32 is freshly flashed and booted, IRK changes, already bonded devices do not work because of the changed IRK.

Correct, this basically changes the device identity.

Going from 2.0.0 back to 1.4.2 What is happening to the random IRK generated when 2.0.0 is used for the first time? Will it be set back to the default one? Why is the ESP32 crashing on boot? Anything we can do to avoid this? Is the only advice to erase the flash?

The NVS partition will still have that IRK, the old firmware will not be aware of it.

I can't say for sure what the crashing is, I would need to test it. I suspect it's due to the different data stored in NVS due to the IRK change.

Best bet is to erase the flash because the NVS format will have been changed. Or at least erase the NVS on boot. You could do this by writing a version parameter or something and detecting it.

h2zero avatar Dec 03 '24 20:12 h2zero

@KlausMu I have created a way to detect the downgrade and only erase the NimBLE bond storage info instead of the entire NVS.

    esp_err_t err = nvs_flash_init();
    if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
        err = nvs_flash_erase();
        if (err == ESP_OK) {
            err = nvs_flash_init();
        }
    }

    if (err != ESP_OK) {
        Serial.printf("nvs_flash_init() failed; err=%d", err);
    }

    nvs_handle_t nimble_handle;
    err = nvs_open("nimble_bond", NVS_READWRITE, &nimble_handle);
    if (err != ESP_OK) {
        Serial.printf("NVS open operation failed");
    }

    size_t required_size = 0;
    err = nvs_get_blob(nimble_handle, "rpa_rec_1", NULL, &required_size);
    if (err != ESP_OK) {
        err = nvs_get_blob(nimble_handle, "local_irk_1", NULL, &required_size);
    }

    if (err == ESP_OK) {
        nvs_erase_all(nimble_handle);
        nvs_commit(nimble_handle);
    }

    nvs_close(nimble_handle);

Give that a try and let me know.

h2zero avatar Dec 09 '24 17:12 h2zero

@h2zero I think this could be useful, but I am not 100% sure which cases your code covers.

So when exactly is rpa_rec_1 present and when is local_irk_1 present.

I did some tests, and I think I understood the following:

rpa_rec_1 is present exactly if a bond from 2.0.0 is in NVS

local_irk_1 in 2.0.0 seems always to be present, no matter if bonds are in NVS or not, no matter if freshly flashed or not in 1.4.2 present only if 2.0.0 was flashed before, no matter if bonds are in NVS or not

Overall, I think the following pseudo code could be used:

if ((NimBLE == 2.0.0) && (numbonds > 0) && (rpa_rec_1 is NOT present))  {
  // we are in 2.0.0  
  // we have at least one bond
  // but this bond is not from 2.0.0
  // -> we are going from 1.4.2 to 2.0..0 with an already bonded peer from 1.4.2
  // -> we have to delete bonds, because they will connect, but they will not work properly
  nvs_erase_all("nimble_bond") 

} else if ((NimBLE == 1.4.2) && (rpa_rec_1 is present)) {
  // we are in 1.4.2
  // we have a bond from 2.0.0 in NVS
  // -> we are going back from 2.0.0 to 1.4.2 with an already bonded peer from 2.0.0
  // -> we have to delete bonds, otherwise it will crash
  nvs_erase_all("nimble_bond") 

}

Do you think this is correct? How can I get the version numer of NimBLE at runtime?

KlausMu avatar Dec 10 '24 21:12 KlausMu

Both of those keys are only present in version 2.x (nimble core 1.5), so in your app you can use the code I provided to detect if they exist and erase the bonds if found. I'd put this right at the beginning of setup, before any NimBLE calls.

h2zero avatar Dec 10 '24 21:12 h2zero

Ok, they are only set by 2.0.0. And if I am in 1.4.2, I should simply delete the bonds (and these two blobs) in any case.

But I think the first part of my code (detect upgrade from 1.4.2 to 2.0.0) should be correct, right? If I am in 2.0.0 and numBonds > 0 and rpa_rec_1 is NOT present (or both blobs are not set, should make no difference), I also have to delete all bonds.

KlausMu avatar Dec 10 '24 21:12 KlausMu

Yes, that would be correct as well.

h2zero avatar Dec 10 '24 21:12 h2zero

Ok, I'll test it in more detail and give feedback here. Thanks!

KlausMu avatar Dec 10 '24 21:12 KlausMu

Two more questions:

Question 1 NimBLEDevice::getNumBonds(); is only available after NimBLEDevice::init(deviceName); was called. But init already makes the ESP32 crashing. Is there any other way to check if there are already bonds stored in NVS? Any other data I could directly read vom NVS? Sorry for asking, I tried digging through ble_store.cand ble_store_util.c, but it is really hard to understand for me ...

Question 2 After erasing the nimble_bond namespace, NimBLEDevice::getNumBonds() still returns the bonded peers. I believe they are saved somewhere in RAM after they have been read from NVS. Can I force refilling the RAM from NVS? Or do I have to reboot the ESP32?

KlausMu avatar Dec 11 '24 12:12 KlausMu

@KlausMu You can check for peer_sec_1 which will be there if at least 1 bond exists.

After erasing the nimble_bond namespace, NimBLEDevice::getNumBonds() still returns the bonded peers. I believe they are saved somewhere in RAM after they have been read from NVS. Can I force refilling the RAM from NVS? Or do I have to reboot the ESP32?

I can't think of any way to clear the RAM other than reset.

h2zero avatar Dec 11 '24 14:12 h2zero

Finally, I got it working. Both when upgrading from 1.4.x to 2.0.x, and when downgrading from 2.0.x to 1.4.x. Thanks @h2zero for your help. For those who are interested, I'll post here the complete code of a function called delete_bonds_if_NimBLE_version_changed() This code is called best as early as possible in setup(). It must be called before NimBLEDevice::init

// This include is only needed to determine if NimBLE 1.4.x or 2.0.x is used.
// NimBLE 2.0.x is using nimble core 1.5, and only in this version BLE_STORE_OBJ_TYPE_LOCAL_IRK is defined
#include "nimble/nimble/host/include/host/ble_store.h"
#if defined(BLE_STORE_OBJ_TYPE_LOCAL_IRK)
#define NIMBLE_ARDUINO_2_x
#endif

#include <nvs.h>
#include <nvs_flash.h>

void delete_bonds_if_NimBLE_version_changed() {
  // This function checks if bonds are already present when changing from NimBLE 1.4.x to 2.0.x or from 2.0.x back to 1.4.x
  // In these cases, we have to delete the already existing bonds.
  // Otherwise the bonds will not work (when going from 1.4.x to 2.0.x) or the ESP32 will even crash (when going from 2.0.x back to 1.4.x).
  // See https://github.com/h2zero/NimBLE-Arduino/issues/740
  // The name of the NVS partition and blobs used in this function can be seen here:
  // <nimble/nimble/host/store/config/src/ble_store_nvs.c>
  // NimBLE 1.4.x -> nimble core 1.4
  // NimBLE 2.0.x -> nimble core 1.5

  // startup: init flash
  esp_err_t err = nvs_flash_init();
  if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
    Serial.printf("nvs_flash_init() failed with error=%d, will erase flash\r\n", err);
    err = nvs_flash_erase();
    if (err != ESP_OK) {
      Serial.printf("nvs_flash_erase() failed with error=%d; will return\r\n", err);
      return;
    }
    err = nvs_flash_init();
    if (err != ESP_OK) {
      Serial.printf("nvs_flash_init() failed with error=%d, even after flash was erased; will return\r\n", err);
      return;
    }
  }
 
  // open partition "nimble_bond" where the bonds are stored
  nvs_handle_t nimble_bond_handle;
  err = nvs_open("nimble_bond", NVS_READWRITE, &nimble_bond_handle);
  if (err != ESP_OK) {
    Serial.printf("nvs_open 'nimble_bond' failed with error=%d, will return\r\n", err);
    return;
  }

  size_t required_size = 0;
  // Key generated during the pairing process. Present if a bond exists, used by NimBLE 1.4.x and NimBLE 2.0.x
  err = nvs_get_blob(nimble_bond_handle, "peer_sec_1", NULL, &required_size);
  bool bond_exists = (err == ESP_OK);
  // Resolvable Private Address (RPA): Bluetooth Device Address that changes periodically.
  // Only present in NimBLE 2.0.x
  err = nvs_get_blob(nimble_bond_handle, "rpa_rec_1", NULL, &required_size);
  bool rpa_exists = (err == ESP_OK);
  // Identity Resolving Key (IRK): Key used for Address Resolution (resolves an RPA).
  // Only present in NimBLE 2.0.x
  err = nvs_get_blob(nimble_bond_handle, "local_irk_1", NULL, &required_size);
  bool irk_exists = (err == ESP_OK);
  // and just for information, what an Identity Address is:
  // Identity Address: An address associated with an RPA that does not change over time. An IRK is required to resolve an RPA to its Identity Address.
 
  // Serial.printf("'peer_sec_1' present: %s; 'rpa_rec_1' present: %s; 'local_irk_1' present: %s\r\n", bond_exists ? "yes" : "no", rpa_exists ? "yes" : "no", irk_exists ? "yes" : "no");
  /*
                                              peer_sec_1 rpa_rec_1  local_irk_1     partition 'nimble_bond' should be deleted
  1.4.x, no bonds                             NO         NO         NO
  1.4.x, with bonds from 1.4.x                YES        NO         NO
  1.4.x, with bonds from 2.0.x                YES        YES        YES             x  (otherwise will not work)
  1.4.x, with bonds from 2.0.x deleted        NO         NO         Y/N(*)         (x) (just to be save, would work without)    (*)YES or NO, depending on ESP32 has rebooted at least once in 2.0.x after bond was deleted
  2.0.x, no bonds                             NO         NO         YES
  2.0.x, with bonds from 1.4.x                YES        NO         YES             x  (otherwise will crash)
  2.0.x, with bonds from 1.4.x deleted        NO         NO         YES
  2.0.x, with bonds from 2.0.x                YES        YES        YES
  */

  #if !defined(NIMBLE_ARDUINO_2_x)
  // We are in NimBLE 1.4.x. Check if we downgraded from NimBLE 2.0.x
  bool erase_nimble_partition = (rpa_exists || irk_exists);
  if (erase_nimble_partition) {
    Serial.printf("We are using NimBLE 1.4.x, but bonds from NimBLE 2.0.x are present. We have to delete all bonds, otherwise ESP32 will crash! Please bond your peers again.\r\n");
  }
  #else
  // We are in NimBLE 2.0.x. Check if we upgraded from NimBLE 1.4.x
  bool erase_nimble_partition = bond_exists && !(rpa_exists);
  if (erase_nimble_partition) {
    Serial.printf("We are using NimBLE 2.0.x, but bonds from NimBLE 1.4.x are present. We have to delete all bonds, otherwise they will not work! Please bond your peers again.\r\n");
  }
  #endif

  if (erase_nimble_partition) {
    nvs_erase_all(nimble_bond_handle);
    nvs_commit(nimble_bond_handle);
    nvs_close(nimble_bond_handle);
    // ESP needs to be restarted, because NVS data is still in nimble RAM
    Serial.printf("  NVS partition 'nimble_bond' was erased. Now we have to restart the ESP32 to also clear nimble RAM.\r\n");
    ESP.restart();
  } else {
    nvs_close(nimble_bond_handle);
  }
}

KlausMu avatar Dec 12 '24 20:12 KlausMu