snabb
snabb copied to clipboard
assert failure in header:new_from_mem
hello snabbers,
In my apps, I am using the following idiom to process packets at multiple places.
local p = link.receive(input)
local data, length = p.data, p.length
local eth_pkt = self.eth_pkt:new_from_mem(data, length)
After processing packets successfully for some time, I get this error:
core/main.lua:137: in function <core/main.lua:135>
[C]: in function 'error'
core/main.lua:26: in function 'assert'
lib/protocol/header.lua:230: in function 'new_from_mem'
I have had this error from multiple apps in apps network from where the new_from_mem is called. So it's not possible to localise the issue to a single app or invocation. The assert in question is
function header:new_from_mem (mem, size)
local o = _new(self)
local header = o._header
assert(ffi.sizeof(header.t) <= size)
header.box[0] = ffi.cast(header.ptr_t, mem)
return o
end
I am not sure what could be the trigger to this assert failure, as it happens randomly after running fine for some time and at random location where new_from_mem is invoked. Any pointers as to where I should be looking?
The problem is that the data packet is smaller than the size of the protocol header, so it's impossible to parse. Your app is expecting to receive an ethernet frame (inferring from the name eth_pkt) but is getting something that can't possibly be one.
Ah, ok thanks, this gives me some thing to start looking!
I added an assert and a check for length before calling new_from_mem to figure out what was happening.
while not link.empty(input) do
local p = link.receive(input)
local data, length = p.data, p.length
if (length < 14) then
print ("Dwarf packet length is " .. length)
end
assert(length >= 14)
The output I am getting is:
Dwarf packet length is 0
Conversation on Slack indicates that this might be a bug in intel_mp, similar to some that were caught in the older 82599 driver in its development phase. Just putting down this note for the record :)
Update:
I have tried using apps.intel.intel_app but I am still getting this error. So this may not be a but in intel_mp.
I have added a condition to ignore packets with zero bytes. So far there seems to be no other issues.
Another observation is that, say my app network looks like the following dig:
NIC - > [a] -> [b] -> [c] ->NIC
Most of the time the zero packets are being received in a, but I very rarely am getting this error in b and c also. As I have mentioned, I am ignoring zero length packets in a, but still getting errors in b or c.
This error happens randomly, or at least so far I am not able to isolate any pattern, which is one of the reason why making a small test case is difficult.
Some more updates:
I printed the packet count from the NIC till my app to check if any packets are getting lost on the way.
Printed the following stats from the shm directory:
pci/0000\:04\:00.0/q0_rxpackets.counter
links/south_if.output\ -\>\ sort.input/rxpackets.counter
links/south_if.output\ -\>\ sort.input/txpackets.counter
apps/sort/rxpackets.counter
apps/sort/zeropackets.counter
In sort app, rxpackets.counter is updated immediately after link.receive(), if length > 0, otherwise zeropackets.counter is incremented.
Values of these counters after running for some time is:
pci_rxpkt 10284438
link_rxpkt 10284461
link_txpkt 10284461
sort_rxpkt 10284051
zero_pkt 410
difference 410
difference is the difference between link_txpkt and sort_rxpkt.
As it can be seen, the difference matches exactly with zero_pkt. So all packets received by links are being transmitted to app, but some of them turns out to be of zero length.