aravis icon indicating copy to clipboard operation
aravis copied to clipboard

Lots of missing packets talking to a GigE camera: debugging suggestions?

Open dkogan opened this issue 1 year ago • 1 comments

Hi. I'm seeing a consistent issue where my camera cannot receive even one successful image due to missing gige packets. It's similar to #434. I can do my own debugging, but I'm hoping you can suggest some better ways to do that. Observations:

  • High-resolution GigE camera (Emergent HR-20000C; 20MP)
  • Linux. Fast machine, but very old kernel (3.10)
  • No missing packets when running the vendor's tools (they work)
  • No missing packets in tcpdump logs
  • Running as root enables the packet-socket method, and that fixes the problem. But the standard socket method is what is expected to be used most of the time, and that's supposed to work too; right?
  • MTU is already 9000, and increasing the socket buffer size and retention time doesn't appear to make a difference
  • Poking at the code, it looks like the packets are lost in the OS somewhere, not in aravis

So given all that, any suggestions? In your experience, are the packets being lost in the OS? If so, it should be possible to use perf, or something, to see conclusively where the packet loss is occurring. What do you do, usually?

Thank you very much for aravis. It's truly incredible. I'm now porting the 3rd set of my cameras to use it, and more or less, it's working perfectly.

dkogan avatar Apr 02 '24 05:04 dkogan

I poked at this a bit more. In short, I wasn't setting the socket buffer size properly, and fixing that made it work.

Notes:

The socket buffer size can be set like this:

g_object_set (*stream,
              "socket-buffer",      ARV_GV_STREAM_SOCKET_BUFFER_FIXED,
              "socket-buffer-size", 20000000,
              NULL);

This sets the SO_RCVBUF setting on the socket. Described like this in socket(7):

SO_RCVBUF
    Sets or gets the maximum socket receive buffer in bytes. The
    kernel doubles this value (to allow space for bookkeeping
    overhead) when it is set using setsockopt(2), and this doubled
    value is returned by getsockopt(2). The default value is set by
    the /proc/sys/net/core/rmem_default file, and the maximum allowed
    value is set by the /proc/sys/net/core/rmem_max file. The minimum
    (doubled) value for this option is 256.

The setting ends up in sk_rcvbuf in the kernel:

  • https://lxr.linux.no/#linux+v6.7.1/net/core/sock.c#L1250
  • https://lxr.linux.no/#linux+v6.7.1/net/core/sock.c#L960

Which is used in multiple places in https://lxr.linux.no/#linux+v6.7.1/net/ipv4/udp.c

If the data comes in faster than it can be read, it's stored in this buffer; if the buffer is too small, the packets are thrown away. The logic and available diagnostics are in the udp.c file linked above. /proc/net/snmp records the dropped packets and there's a udp_fail_queue_rcv_skb tracepoint to catch some paths.

The buffer size should be set to a bit more than the ArvBuffer payload size probably. I would guess aravis already does that by default, but it wasn't working on my machine. I'm going to take a look to get more detail in a bit.

dkogan avatar Apr 13 '24 04:04 dkogan