zeek-af_packet-plugin
zeek-af_packet-plugin copied to clipboard
AF_Packet plugin fails to recover after network service restart
Steps to reproduce:
- Start a Zeek worker(s) using the af_packet plugin
- Restart networking
- See that the worker CPU usage goes to 100% and the worker(s) stops receiving packets
Expected outcome: if ( ! rx_ring->GetNextPacket(&packet) ), should return an error if the network interface is no longer getting packets and is unavailable. This should result in the worker crashing or the plugin re-opening the socket.
I have a fix for a similar plugin that uses a Testimony socket rather than AF Packet but I'm not as familiar with how to solve it for this plugin.
Thanks for your suggestion. Given that AF_Packet just reads from a memory area that is shared between the kernel and the application, I think there is no immediate feedback in case the socket was killed. That is, the plugin would be unable to distinguish between a situation, where there are just no packets received and a situation where the socket doesn't exist anymore, just by reading the RX ring. I could imagine to define some threshold if no packets are received and do a getsockopt()
to check whether the underlying socket is still there. However, it might be that the control flow never returns to the plugin if Zeek uses the selectable file descriptor to determine whether a new packet is available. Thus, to be honest, I am not sure if it's worth to dig into this. Is there some use case that wouldn't make this situation a rare exception?
That's fair and thanks for the response!
I operate a pretty large fleet of hosts that use bonded network interfaces and with enough hosts and changes, a rare exception isn't as rare as I'd hope. I've since moved to using a different plugin to read packets from Testimony so this issue isn't affecting me anymore. I figured I'd bring it up to hopefully save some of the hair pulling I dealt with when trying to debug this issue but I also understand this might not be worth the trouble.