aravis
aravis copied to clipboard
GVSP fails when client has Multiple NICs
Describe the bug GVSP fails to connect to the client when the client is running on a machine with multiple network interfaces.
To Reproduce I am running the EPICS AreaDetector Client https://github.com/areaDetector/ADAravis to connect to an AVT Manta. All GVCP communications works fine but when I initiate acquisition the client sees no GVSP packets. I have done a network trace and can see that the client passes the ephemeral port for GVSP to the camera, however the IP address it passes along with this is for the wrong Network interface and is not routable from the camera.
If I run the same code on a different client with a single NIC then the streaming works correctly.
Camera description:
- Manufacturer AVT
- Model Manta G-235 PoE
- Interface: Ethernet UDP
Platform description:
- Aravis version
- OS: Ubuntu 20.04
- Hardware x86_64
Could you attach the output of arv-tool-0.8 -d all:3
?
Thanks for getting back to me.
The results are:
arv-tool-0.8 -d all:3
Found 0 USB3Vision device (among 5 USB devices)
[GvDiscoverSocket::new] Add interface 127.0.0.1
[GvDiscoverSocket::new] Add interface 172.23.168.18
[GvDiscoverSocket::new] Add interface 172.17.0.1
[GvDiscoverSocket::new] Add interface 10.46.0.0
No device found
The 172.23.168.18 interface is the one through which GSVP is correctly talking to the camera. The camera is on a different subnet which is why it is showing as not found. The two subnets have a direct route through a switch.
When I tested this with a packet sniffer I saw that there was a control packet set with destination address 10.46.0.0.
I think I was hitting something similar with #232 , does it look similar? (I fortunately no longer need that functionality). How are other GigEVision libs handling this? Is it possible to query the camera for IP address it sees when the client talks to it?
Camera description:
Manufacturer Tecphos Model Custom Interface: Ethernet UDP Platform description: Aravis version 0.8.19 OS: MacOS 11.0.1 Hardware x86_64
I am in a similar situation. My results are:
[16:20:33.775] 🅸 interface> Found 0 USB3Vision device (among 10 USB devices)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 127.0.0.1 (127.0.0.1)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.1.135 (192.168.1.255)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 10.211.55.2 (10.211.55.255)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 10.37.129.2 (10.37.129.255)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.2.1 (192.168.2.255)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.2.16 (192.168.2.255)
[16:20:33.776] 🆆 interface> [ArvGVInterface::send_discover_packet] Error sending packet using local broadcast: Error sending message: Can't assign requested address
[16:20:34.780] 🅳 misc> Regex '^.*$' created from glob '*'
No device found
I am watching the interface with Wireshark. It sees no packets sent on this NIC. I know the hardware will respond to a discovery command. I have a short go program that delivers the discovery command to 192.168.2.255 and I can see the command and response on Wireshark.
When I arv-tool-0.8 -a 192.168.2.9 -d all:3
it is able to communicate with the camera. I see many read commands and acks on Wireshark.
To understand what goes wrong, I need a wireshark capture of the network traffic during arv-tool-0.8 -d all:4
, the corresponding console output, the host network configuration and a drawing of the network topology.
Wireshark saw no network traffic during arv-tool-0.8 -d all:4
The console output was:
barry@Barrys-MBPR bin % arv-tool-0.8 -d all:4
[10:01:52.123] 🅸 interface> Found 0 USB3Vision device (among 10 USB devices)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 127.0.0.1 (127.0.0.1)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.1.135 (192.168.1.255)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 10.211.55.2 (10.211.55.255)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 10.37.129.2 (10.37.129.255)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.2.1 (192.168.2.255)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.2.16 (192.168.2.255)
[10:01:52.124] 🆆 interface> [ArvGVInterface::send_discover_packet] Error sending packet using local broadcast: Error sending message: Can't assign requested address
[10:01:53.126] 🅳 misc> Regex '^.*$' created from glob '*'
No device found
This is the section from ifconfig for the nic connected to the camera:
en17: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
options=6467<RXCSUM,TXCSUM,VLAN_MTU,TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
ether 00:e0:4c:03:22:03
inet6 fe80::8bb:882c:74fb:476c%en17 prefixlen 64 secured scopeid 0x19
inet 192.168.2.16 netmask 0xffffff00 broadcast 192.168.2.255
nd6 options=201<PERFORMNUD,DAD>
media: autoselect (1000baseT <full-duplex>)
status: active
The camera is connected directly to that NIC port. The NIC is a USB-C -> GigE dongle.
Here is a wireshark capture after arv-tool-0.8 -a 192.168.2.9 -d all:4
Here is the console response to that command:
All the traffic was directed through 192.128.2.1. Here is the ifconfig section for 192.128.2.1:
bridge102: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
options=63<RXCSUM,TXCSUM,TSO4,TSO6>
ether 3a:f9:d3:6a:f3:66
inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
inet6 fe80::cf8:efe7:76eb:407d%bridge102 prefixlen 64 secured scopeid 0x1a
Configuration:
id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
ipfilter disabled flags 0x0
member: en12 flags=3<LEARNING,DISCOVER>
ifmaxaddr 0 port 27 priority 0 path cost 0
member: en17 flags=3<LEARNING,DISCOVER>
ifmaxaddr 0 port 25 priority 0 path cost 0
nd6 options=201<PERFORMNUD,DAD>
media: autoselect
status: active
It is some sort of virtual interface automatically configured by MacOS. I suspect is goal is to enable internet sharing across the NICs. It seems to be required to enable DHCP to serve the directly attached camera.
-
I am not claiming the camera is spec compliant or production ready. It is in prototype testing. It looks like we need to provide ack pending responses for the memory reads. I do think it will respond correctly to a discover command.
-
With my golang test I look for NIC en17, acquire its address, and send the discovery broadcast from there. That does work. The request packet is from and the response packet is to 192.168.2.16.
Thank you for entertaining this issue.
Hi @EmmanuelP , I think I have an approach that would fix this issue for most cases.
At the moment the source of the interface IP comes from calling arv_gv_interface_camera_locate
(https://github.com/AravisProject/aravis/blob/main/src/arvgvinterface.c#L535C1-L535C31)
This function seems to have 2 phases:
- firstly, it compares the masked IP value of each interface with the masked IP value of the camera, if this condition is met, it uses said IP
- if the last step didn’t work, it tries to send a basic control packet to the camera over each interface, it uses the IP of the interface that got the answer back. This is the part were the wrong IP is obtained, not sure why.
My suggestion is that before the second phase, we could try sending the control packet only to the interface associated to the default gateway, if that doesn't answer, then do the second phase with all the interfaces. Or alternatively, we can have an option that determines if the second phase uses all the interfaces or just the primary one.
I think it is reasonable to prioritise that interface and it fixes the problem for us, though I couldn't figure out exactly why the second phase can sometimes select the wrong interface.
Please let me know your thoughts.
EDIT: I figured out the reason! the interface tested is a bridge created by a kubernetes network plugin called weave. Sending anything over that bridge ends up being masqueraded (or source NATed if you like) and sent via the primary interface, the response is redirected back. I still think my proposal is a good approach to deal with this.
Thanks @EmilioPeJu.
@EmmanuelP would you accept a PR that implements Emilio's suggestion?
The issue is simply that if there is an additional interface that can also contact the camera via a NAT then then there is a chance that the library will bind to that for the GVSP stream. But NATs will not allow new connections from outside.
This sounds unusual but it is normally the case when running inside a container or even running natively on a server that has kubernetes components installed.
Emilios solution only changes the order of the choosing the interfaces such that the default one is first - this should not affect any other users.
@EmmanuelP would you accept a PR that implements Emilio's suggestion?
I don't grasp the full implications of the proposal. A pull request would at least help me to test it. So please go ahead.
Thanks!