connectedhomeip
connectedhomeip copied to clipboard
Strange buffer pointers in LwIPReceiveUDPMessage on ESP32
Problem
Derived from #19289 — added some pbuf logging on master and ran all-clusters-app on M5Stack (patch and sample log attached).
Every pbuf handed to LwIPReceiveUDPMessage()
seems to have a bad payload
pointer, outside the buffer. The pointers vary from run to run but are somewhat consistent within a run; for the same pbuf, the payload pointer is often (but not always) the same. The payload pointers always seem to be moderately close to the pbuf (within 32K), but seem to never be within the buffer.
Examples extracted from the attached log —
pbuf pointer pbuf->payload, offset len AllocSize
-------------- --------------------- ----------- -----------
UDP pb: 0x3fff07c0 pay: 0x3fff099c 476 len: 0x0024 asz: 0x05D8
UDP pb: 0x3fff07c0 pay: 0x3fff099c 476 len: 0x0028 asz: 0x05D8
UDP pb: 0x3fff07c0 pay: 0x3fff099c 476 len: 0x0046 asz: 0x05D8
UDP pb: 0x3fff07c0 pay: 0x3fff099c 476 len: 0x0056 asz: 0x05D8
UDP pb: 0x3fff07c0 pay: 0x3fff099c 476 len: 0x005E asz: 0x05D8
UDP pb: 0x3fff07c0 pay: 0x3fff099c 476 len: 0x00D8 asz: 0x05D8
UDP pb: 0x3fff07c0 pay: 0x3fff099c 476 len: 0x017A asz: 0x05D8
UDP pb: 0x3fff07c0 pay: 0x3fff099c 476 len: 0x017B asz: 0x05D8
UDP pb: 0x3fff2ec4 pay: 0x3fff348c 1480 len: 0x0028 asz: 0x05D8
UDP < pb: 0x3fff23d0 pay: 0x3fff0f8c -5188 len: 0x0024 asz: 0x05D8
UDP < pb: 0x3fff23d0 pay: 0x3fff0f8c -5188 len: 0x0028 asz: 0x05D8
UDP < pb: 0x3fff23d0 pay: 0x3fff0f8c -5188 len: 0x0046 asz: 0x05D8
UDP < pb: 0x3fff23d0 pay: 0x3fff0f8c -5188 len: 0x005E asz: 0x05D8
UDP < pb: 0x3fff23d0 pay: 0x3fff0f8c -5188 len: 0x00A9 asz: 0x05D8
UDP < pb: 0x3fff23d0 pay: 0x3fff0f8c -5188 len: 0x00D6 asz: 0x05D8
UDP < pb: 0x3fff23d0 pay: 0x3fff0f8c -5188 len: 0x017A asz: 0x05D8
UDP < pb: 0x3fff2eb8 pay: 0x3fff0f8e -7978 len: 0x001A asz: 0x05D8
UDP > pb: 0x3ffee54c pay: 0x3fff1122 11222 len: 0x00B1 asz: 0x05D8
UDP > pb: 0x3fff0224 pay: 0x3fff3e06 15330 len: 0x0033 asz: 0x05D8
UDP > pb: 0x3fff0238 pay: 0x3fff0f8e 3414 len: 0x02E0 asz: 0x05D8
UDP > pb: 0x3fff0238 pay: 0x3fff5446 21006 len: 0x0033 asz: 0x05D8
UDP > pb: 0x3fff05e4 pay: 0x3fff0f8c 2472 len: 0x004D asz: 0x05D8
UDP > pb: 0x3fff05e4 pay: 0x3fff0f8c 2472 len: 0x005E asz: 0x05D8
UDP > pb: 0x3fff05e4 pay: 0x3fff0f8c 2472 len: 0x017A asz: 0x05D8
UDP > pb: 0x3fff05e4 pay: 0x3fff3762 12670 len: 0x0022 asz: 0x05D8
UDP > pb: 0x3fff05e4 pay: 0x3fff3762 12670 len: 0x002A asz: 0x05D8
UDP > pb: 0x3fff05e4 pay: 0x3fff3d56 14194 len: 0x02E0 asz: 0x05D8
UDP > pb: 0x3fff07fc pay: 0x3fff1040 2116 len: 0x0024 asz: 0x05D8
UDP > pb: 0x3fff07fc pay: 0x3fff1040 2116 len: 0x00A4 asz: 0x05D8
UDP > pb: 0x3fff07fc pay: 0x3fff1040 2116 len: 0x00A9 asz: 0x05D8
UDP > pb: 0x3fff07fc pay: 0x3fff1040 2116 len: 0x00D6 asz: 0x05D8
UDP > pb: 0x3fff07fc pay: 0x3fff1040 2116 len: 0x017A asz: 0x05D8
UDP > pb: 0x3fff07fc pay: 0x3fff1040 2116 len: 0x017B asz: 0x05D8
UDP > pb: 0x3fff0818 pay: 0x3fff348c 11380 len: 0x0046 asz: 0x05D8
UDP > pb: 0x3fff0818 pay: 0x3fff348c 11380 len: 0x00D6 asz: 0x05D8
UDP > pb: 0x3fff194c pay: 0x3fff3b32 8678 len: 0x0033 asz: 0x05D8
UDP > pb: 0x3fff194c pay: 0x3fff43fa 10926 len: 0x02E0 asz: 0x05D8
UDP > pb: 0x3fff1960 pay: 0x3fff3b30 8656 len: 0x004D asz: 0x05D8
UDP > pb: 0x3fff23d0 pay: 0x3fff348c 4284 len: 0x00D8 asz: 0x05D8
UDP > pb: 0x3fff23d0 pay: 0x3fff348e 4286 len: 0x0022 asz: 0x05D8
UDP > pb: 0x3fff2c0c pay: 0x3fff47ae 7074 len: 0x02E0 asz: 0x05D8
UDP > pb: 0x3fff2cec pay: 0x3fff41d4 5352 len: 0x0090 asz: 0x05D8
UDP > pb: 0x3fff2d00 pay: 0x3fff3b30 3632 len: 0x00D6 asz: 0x05D8
UDP > pb: 0x3fff2ec4 pay: 0x3fff348c 1480 len: 0x0090 asz: 0x05D8
UDP > pb: 0x3fff2ec4 pay: 0x3fff348c 1480 len: 0x00A4 asz: 0x05D8
UDP > pb: 0x3fff2ec4 pay: 0x3fff348c 1480 len: 0x00D6 asz: 0x05D8
UDP > pb: 0x3fff2ec4 pay: 0x3fff348c 1480 len: 0x017B asz: 0x05D8
UDP > pb: 0x3fff2ec4 pay: 0x3fff3b32 3182 len: 0x0033 asz: 0x05D8
UDP > pb: 0x3fff2ec4 pay: 0x3fff3e06 3906 len: 0x0033 asz: 0x05D8
UDP > pb: 0x3fff2ec4 pay: 0x3fff43fa 5430 len: 0x0033 asz: 0x05D8
0001-LOG.patch.txt log.chip-all-clusters-app.20220608181346.txt
@dhrishi
The payload pointer may be outside the pbuf depending how it's allocated. You'd have to trace the path for incoming packets
The payload pointer may be outside the pbuf depending how it's allocated. You'd have to trace the path for incoming packets
It looks like that is the case; the incoming buffers are PBUF_REF with the payload somewhere else.
The current PacketBuffer
code, and the LwIP receive path, fundamentally assume that the pbuf
and payload are part of one contiguous block, and use the space in betwwn, so this isn't going to work.
Oh, wow. So the fact that it ever worked at all was basically just luck? We were stomping on parts of the pbuf pool, but they happened to be ok to stomp on, by accident???
I think the reason it ‘works’ is that LwIP reads the whole low-level frame into a buffer, and then consumes up through the IP header, and returns a payload pointer to that point. So when the code reuses that space, by accessing it relative to the payload pointer, it works. #19244 tried accessing it relative to the header, and crashed.
EnsureReservedSize()
always claims success, because ReservedSize()
casts the pointer difference to unsigned, so everything looks like a large reserve.
I don't think there's any hope of replacing all the ‘reserve’ code and uses for 1.0, so I think the path forward is:
- Reconfigure incoming LwIP on ESP32 (and possibly others) to use one of the contiguous LwIP buffer types,
PBUF_RAM
orPBUF_POOL
(I don't know where to do this); - Document that
CHIP_SYSTEM_CONFIG_PACKETBUFFER_LWIP_PBUF_TYPE
must be one of these buffer types; - Wherever we get an LwIP pbuf (
LwIPReceiveUDPMessage()
andPacketBuffer::New()
that I know of), doVerifyOrDie(p->flags & PBUF_TYPE_FLAG_STRUCT_DATA_CONTIGUOUS)
.
So for incoming pbufs, the only thing we use the reserve stuff for is the packetinfo, right? I guess the problem is how we send through the packetbuffer and the packetinfo to the right Matter thread. Seems like the only options are:
- We do what you suggest above, send the packetinfo in the reserved space. This probably requires LwIP to do a copy of the data, right?
- We keep doing what we do now, with static asserts that a packetinfo is always smaller than the IP header .... Seems dubious.
- We allocate (heap? pool?) a struct that has the packetinfo and the pointer to the pbuf, and then pass that to the Matter thread. That avoids the pbuf copy but needs the extra allocation... which is likely smaller than the space we would need for the pbuf copy. In this setup, PacketBuffer::New can still check
PBUF_TYPE_FLAG_STRUCT_DATA_CONTIGUOUS
, because I think it should hold there....
- We do what you suggest above, send the packetinfo in the reserved space. This probably requires LwIP to do a copy of the data, right?
That may be so, though I don't know LwIP internals. (I speculate that it may be using the separate-payload model because some hardware may make a page-aligned payload advantageous.)
- We keep doing what we do now, with static asserts that a packetinfo is always smaller than the IP header .... Seems dubious.
- We allocate (heap? pool?) a struct that has the packetinfo and the pointer to the pbuf, and then pass that to the Matter thread.
Yes, @tcarmelveilleux suggested similar in #17213. But I naïvely thought that just fixing the alignment would be enough for now…
If the reserve concept doesn't get ripped out entirely (which would definitely simplify PacketBuffer), then ReservedSize()
should also check for the CONTIGUOUS flag and return zero otherwise.
If the reserve concept doesn't get ripped out entirely
I think we use it quite a bit on the save path to pre-reserve space for our headers....
then ReservedSize() should also check for the CONTIGUOUS flag and return zero otherwise.
Yes, that would make a lot of sense.
I checked how IPPacketInfo
class is used in UDPEndPointImplLwIP.cpp
. The packet info is copied in UDPEndPointImplLwIP::HandleDataReceived
.
We can modify the signature to void HandleDataReceived(chip::System::PacketBufferHandle && aBuffer, const PacketInfo &aPacketInfo);
and pass the packet info explicitly in the lambda.
Code will be like:
364 IPPacketInfo pktInfo;
365 pktInfo->SrcAddress = IPAddress(*addr);
366 pktInfo->DestAddress = IPAddress(*ip_current_dest_addr());
367 pktInfo->Interface = InterfaceId(ip_current_netif());
368 pktInfo->SrcPort = port;
369 pktInfo->DestPort = pcb->local_port;
370
371 ep->Retain();
372 CHIP_ERROR err = ep->GetSystemLayer().ScheduleLambda([ep, p = System::LwIPPacketBufferView::UnsafeGetLwIPpbuf(buf), pktInfo] {
373 ep->HandleDataReceived(System::PacketBufferHandle::Adopt(p), pktInfo);
374 ep->Release();
375 });
Will this eliminate the need to access the IP and link layer header of the pbuf? Doing such thing is always dangerous.
That's basically sweeping the heap-allocation of the IPPacketInfo under the rug into the lambda allocation, right? I would rather make any sort of sizeable allocations explicit, so if they fail we gracefully return error instead of throwing uncaught exceptions or whatever happens if we fail to allocate the lambda....
@kpschoedel In the SVE test the issue showed up again in group message tests.
In SessionManager.cpp
we are cloning the packet buffer:
659 while (!decrypted && iter->Next(groupContext))
660 {
661 // Optimization to reduce number of decryption attempts
662 if (groupId != groupContext.group_id)
663 {
664 continue;
665 }
666 msgCopy = msg.CloneData();
667 CryptoContext::NonceStorage nonce;
668 CryptoContext::BuildNonce(nonce, packetHeader.GetSecurityFlags(), packetHeader.GetMessageCounter(),
669 packetHeader.GetSourceNodeId().Value());
670 decrypted = (CHIP_NO_ERROR ==
671 SecureMessageCodec::Decrypt(CryptoContext(groupContext.key), nonce, payloadHeader, packetHeader, msgCopy));
672 }
The msg.CloneData()
fails because the pbuf does not fit in the packet buffer's memory model at all. Raised https://github.com/project-chip/connectedhomeip/pull/20923 as a quick fix.