mynewt-nimble icon indicating copy to clipboard operation
mynewt-nimble copied to clipboard

GAP callback when receiving HCI completed packets

Open tannewt opened this issue 2 years ago • 8 comments

Hi! We're starting to use NimBLE in CircuitPython for the ESP32-S3. We have existing code for BLE that uses the Nordic soft device.

One of our APIs, similar to the nordic uart service, batches data written (without response) into larger ATT packets up to the MTU size. We do this by batching until the previous packet has been sent. With the soft device we get a BLE_GATTC_EVT_WRITE_CMD_TX_COMPLETE event back that we use to queue up what we've buffered.

I'm looking into adding something similar to NimBLE. I've found ble_hs_hci_evt_num_completed_pkts() and it looks like I can add a GAP event based on that. Is that the best approach? I can make a PR once I have it working. Or, is there another way I should get outgoing flow control info? Thanks!

tannewt avatar Feb 09 '22 01:02 tannewt

we have BLE_GAP_EVENT_NOTIFY_TX for notification, I have no objections on adding something similar for writes without response

sjanc avatar Mar 03 '22 13:03 sjanc

I have a similar application. One problem is that BLE_GAP_EVENT_NOTIFY_TX is not called at a helpful time. It seems to be called directly from inside the send function (e.g. ble_gattc_notify_custom), and not when the actual data is sent. This can even cause crashes due to recursion if you try to send more data from that callback.

For example, if you naively implement a stream using this event as feedback, you could get a callstack looking something like this: ble_gattc_notify_custom -> ble_gap_event -> ble_gattc_notify_custom -> ble_gap_event -> ble_gattc_notify_custom -> (stack overflow)

Ideally there should be a separate "application tx flow control" callback that would:

  • Always be called just before the connection interval.
  • Supply information about how much data is already queued in the controller and host. We want to keep these buffers/queues mostly empty to get the lowest latency.

The goal here is that after receiving this callback, the application can fill up a packet up to the MTU size, and have it be sent right away. This setup would give you the lowest latency while maintaining decent throughput.

Right now I am thinking of just hacking in my own callback somewhere around ble_hs_hci_evt_num_completed_pkts as well.

Rubeer avatar Jan 16 '23 16:01 Rubeer

Related https://github.com/adafruit/circuitpython/issues/5926.

Anutrix avatar Nov 05 '23 19:11 Anutrix

I'm trying to implement high bandwidth streaming from the BLE server to the client, using Notify, which is something I've done before with another BLE stack.

Discovered the same thing @Rubeer found. BLE_GAP_EVENT_NOTIFY_TX is pretty much useless, since it only indicates the packet has been queued, not when it's transmitted. Since it's called directly from ble_gattc_notify_custom(), it's hard to imagine a use case where the code in the callback that handles the event couldn't just has easily been called from wherever ble_gattc_notify_custom() was called from.

From @sjanc's response, I wonder if the NimBLE devs were aware it operates this way?

I'm not after low latency, but high bandwidth. To achieve high bandwidth it's necessary to queue multiple packets per connection interval. If one just queues as many notifies (or anything) as possible, all memory on the controller is exhausted and it can't respond to incoming packets. And one gets a huge buffer of packets that adds lots of latency.

The way I solved this with Nordic's stack was to limit the number of outstanding buffered packets to some number. It needs to be at least the number to sent on one connection interval. This can be done by just keeping track of the number of notifies queued when they are added, and then decrementing the count when they are transmitted. But one needs a callback when TX occurs, which NimBLE appears to lack.

I'm coming to the same conclusion as @tannewt, that I need to insert some sort of callback into ble_hs_hci_evt_num_completed_pkts().

The problem is that nothing in NimBLE keeps track of what packets were sent to the controller. So there's no way to dispatch a callback that identifies the specific Notify which has been completed when the HCI event reports completed packets.

The packet count alone in a callback isn't enough for an application to do flow control. An ATT packet can be fragmented into multiple LL packets by L2CAP. And NimBLE will send responses to various requests, which queues packets the application doesn't know about.

So if the app queues four Notify ops, and gets four completed packets in a callback, that could mean: All four notifies have completed; one notify was split into four fragments and completed and three notifies are still outstanding; there were four responses to connection parameter requests transmitted and all four notifies are still fully queued.

xyzzy42 avatar Dec 30 '23 10:12 xyzzy42

I wondered how Espressif could claim 90 kB/sec BLE throughput, when lack of flow control means one needs to use indicate, write with response, or read, and these kill throughput.

The answer is they only do throughput testing on bluedroid and not on nimble. On bluedroid there is call L2CA_GetCurFreePktBufferNum_LE() that allows for some sort of flow control. Throughput is maximized by controlling the number of free buffers to zero. If one cared about latency too, then you'd need the maximum number of numbers to figure out how many used buffers there are, and then control this value to whatever max buffering latency you want.

The throughput test is also unrealistic in that it gets no notifications of completed packets. Instead it busy-polls the number of available buffers, which is too inefficient for a real world application.

xyzzy42 avatar Dec 30 '23 11:12 xyzzy42

Hi,

We've been discussing this with @andrzej-kaczmarek a bit and have some thoughts on how using num complete packets could be implemented

Since num complete packets is for all traffic, not only GATT we provide generic mechanism where sender would get ID which later could be matched in specific GAP event via listener, this way both application and host would be able to use and match those, ID tracking to num complete packets would be handled by host so that upper layer would only wait for specific ID to show up in listener. Details on how this would be implemented are TBD, but we would prefer to avoid using TX mbuf for that (to not hold it until it is transmitted in case host and controller are split)

As a short term solution for high throughput applications I'd suggest to feed data periodically and handle any NOMEM errors

sjanc avatar Jan 02 '24 11:01 sjanc

The ENOMEM system I've already tried, but it has two major flaws. One is that the latency caused by the buffer size being that large is high. Anything with a real-time component needs to limit the number of queued packets. Usually it is desirable to limit to the number which can be sent in one connection interval.

The other is that with all buffer memory used, incoming packets can not be responded to. For instance, an incoming write to a CCCD to turn of the notify subscription!

I see that NimBLE already tracks the number of outstanding packets per connection. So I added a hook to get this information and then plumbed that all the way down to aioble in Micropython. And that is working to for flow control. While this includes all packets, not just GATT, this is actually better for me. Every packet takes buffer space and a time slot in the connection, GATT or not. If there are more non-GATT packets sent, then there is less space for GATT packets. So flow control needs to consider this.

This avoids all the extra bookkeeping to keep track of per-packet IDs and match them up as packets are completed. Because I don't need to know which packets are completed. Only the number of outstanding packets.

I also added a callback to send the packet completed event back out. But I'm not happy with that design. AFAICT, Nimble has just once event listener that receives all events, with no way to mask them. A per-packet completed event is a lot of extra events to receive and not act on. All existing users would just ignore them. It would be better if there was some way to only receive the events if the user cared about flow-control.

xyzzy42 avatar Jan 02 '24 19:01 xyzzy42

Micropython is now streaming data with BLE at over 110 kB/sec. Didn't even need to write the streaming code in C. It's actually better than the ESP32 throughput demo (Bluedroid only!) because it doesn't exhaust all of the controller's memory and it doesn't busy-poll.

So to the next person who finds this two year old thread, it can be done. NimBLE just needed some extra features added.

xyzzy42 avatar Jan 08 '24 19:01 xyzzy42