snabb icon indicating copy to clipboard operation
snabb copied to clipboard

assert failure in header:new_from_mem

Open raj2569 opened this issue 8 years ago • 6 comments

hello snabbers,

In my apps, I am using the following idiom to process packets at multiple places.

  local p = link.receive(input)
  local data, length = p.data, p.length
  local eth_pkt = self.eth_pkt:new_from_mem(data, length)

After processing packets successfully for some time, I get this error:

    core/main.lua:137: in function <core/main.lua:135>
    [C]: in function 'error'
    core/main.lua:26: in function 'assert'
    lib/protocol/header.lua:230: in function 'new_from_mem'

I have had this error from multiple apps in apps network from where the new_from_mem is called. So it's not possible to localise the issue to a single app or invocation. The assert in question is

function header:new_from_mem (mem, size)
   local o = _new(self)
   local header = o._header
   assert(ffi.sizeof(header.t) <= size)
   header.box[0] = ffi.cast(header.ptr_t, mem)
   return o
end

I am not sure what could be the trigger to this assert failure, as it happens randomly after running fine for some time and at random location where new_from_mem is invoked. Any pointers as to where I should be looking?

raj2569 avatar May 12 '17 12:05 raj2569

The problem is that the data packet is smaller than the size of the protocol header, so it's impossible to parse. Your app is expecting to receive an ethernet frame (inferring from the name eth_pkt) but is getting something that can't possibly be one.

alexandergall avatar May 12 '17 13:05 alexandergall

Ah, ok thanks, this gives me some thing to start looking!

raj2569 avatar May 12 '17 13:05 raj2569

I added an assert and a check for length before calling new_from_mem to figure out what was happening.

   while not link.empty(input) do
      local p = link.receive(input)
      local data, length = p.data, p.length
      if (length < 14) then
         print ("Dwarf packet length is " .. length)
      end
      assert(length >= 14)

The output I am getting is:

Dwarf packet length is 0

raj2569 avatar May 16 '17 13:05 raj2569

Conversation on Slack indicates that this might be a bug in intel_mp, similar to some that were caught in the older 82599 driver in its development phase. Just putting down this note for the record :)

wingo avatar May 19 '17 12:05 wingo

Update:

I have tried using apps.intel.intel_app but I am still getting this error. So this may not be a but in intel_mp.

I have added a condition to ignore packets with zero bytes. So far there seems to be no other issues.

Another observation is that, say my app network looks like the following dig:

NIC - > [a] -> [b] -> [c] ->NIC

Most of the time the zero packets are being received in a, but I very rarely am getting this error in b and c also. As I have mentioned, I am ignoring zero length packets in a, but still getting errors in b or c.

This error happens randomly, or at least so far I am not able to isolate any pattern, which is one of the reason why making a small test case is difficult.

raj2569 avatar May 30 '17 15:05 raj2569

Some more updates:

I printed the packet count from the NIC till my app to check if any packets are getting lost on the way.

Printed the following stats from the shm directory:

pci/0000\:04\:00.0/q0_rxpackets.counter
links/south_if.output\ -\>\ sort.input/rxpackets.counter
links/south_if.output\ -\>\ sort.input/txpackets.counter
apps/sort/rxpackets.counter
apps/sort/zeropackets.counter

In sort app, rxpackets.counter is updated immediately after link.receive(), if length > 0, otherwise zeropackets.counter is incremented.

Values of these counters after running for some time is:

pci_rxpkt               10284438
link_rxpkt              10284461
link_txpkt              10284461
sort_rxpkt              10284051
zero_pkt                     410
difference        410

difference is the difference between link_txpkt and sort_rxpkt.

As it can be seen, the difference matches exactly with zero_pkt. So all packets received by links are being transmitted to app, but some of them turns out to be of zero length.

raj2569 avatar May 31 '17 08:05 raj2569