aravis icon indicating copy to clipboard operation
aravis copied to clipboard

GVSP fails when client has Multiple NICs

Open gilesknap opened this issue 3 years ago • 11 comments

Describe the bug GVSP fails to connect to the client when the client is running on a machine with multiple network interfaces.

To Reproduce I am running the EPICS AreaDetector Client https://github.com/areaDetector/ADAravis to connect to an AVT Manta. All GVCP communications works fine but when I initiate acquisition the client sees no GVSP packets. I have done a network trace and can see that the client passes the ephemeral port for GVSP to the camera, however the IP address it passes along with this is for the wrong Network interface and is not routable from the camera.

If I run the same code on a different client with a single NIC then the streaming works correctly.

Camera description:

  • Manufacturer AVT
  • Model Manta G-235 PoE
  • Interface: Ethernet UDP

Platform description:

  • Aravis version
  • OS: Ubuntu 20.04
  • Hardware x86_64

gilesknap avatar Jun 14 '21 06:06 gilesknap

Could you attach the output of arv-tool-0.8 -d all:3 ?

EmmanuelP avatar Jul 05 '21 08:07 EmmanuelP

Thanks for getting back to me.

The results are:

arv-tool-0.8 -d all:3
Found 0 USB3Vision device (among 5 USB devices)
[GvDiscoverSocket::new] Add interface 127.0.0.1
[GvDiscoverSocket::new] Add interface 172.23.168.18
[GvDiscoverSocket::new] Add interface 172.17.0.1
[GvDiscoverSocket::new] Add interface 10.46.0.0
No device found

The 172.23.168.18 interface is the one through which GSVP is correctly talking to the camera. The camera is on a different subnet which is why it is showing as not found. The two subnets have a direct route through a switch.

gilesknap avatar Jul 12 '21 06:07 gilesknap

When I tested this with a packet sniffer I saw that there was a control packet set with destination address 10.46.0.0.

gilesknap avatar Jul 12 '21 07:07 gilesknap

I think I was hitting something similar with #232 , does it look similar? (I fortunately no longer need that functionality). How are other GigEVision libs handling this? Is it possible to query the camera for IP address it sees when the client talks to it?

eudoxos avatar Aug 08 '21 09:08 eudoxos

Camera description:

Manufacturer Tecphos Model Custom Interface: Ethernet UDP Platform description: Aravis version 0.8.19 OS: MacOS 11.0.1 Hardware x86_64

I am in a similar situation. My results are:

[16:20:33.775] 🅸 interface> Found 0 USB3Vision device (among 10 USB devices)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 127.0.0.1 (127.0.0.1)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.1.135 (192.168.1.255)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 10.211.55.2 (10.211.55.255)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 10.37.129.2 (10.37.129.255)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.2.1 (192.168.2.255)
[16:20:33.776] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.2.16 (192.168.2.255)
[16:20:33.776] 🆆 interface> [ArvGVInterface::send_discover_packet] Error sending packet using local broadcast: Error sending message: Can't assign requested address
[16:20:34.780] 🅳 misc> Regex '^.*$' created from glob '*'
No device found

I am watching the interface with Wireshark. It sees no packets sent on this NIC. I know the hardware will respond to a discovery command. I have a short go program that delivers the discovery command to 192.168.2.255 and I can see the command and response on Wireshark.

When I arv-tool-0.8 -a 192.168.2.9 -d all:3 it is able to communicate with the camera. I see many read commands and acks on Wireshark.

bvwj avatar Oct 27 '21 21:10 bvwj

To understand what goes wrong, I need a wireshark capture of the network traffic during arv-tool-0.8 -d all:4, the corresponding console output, the host network configuration and a drawing of the network topology.

EmmanuelP avatar Oct 28 '21 04:10 EmmanuelP

Wireshark saw no network traffic during arv-tool-0.8 -d all:4

The console output was:

barry@Barrys-MBPR bin % arv-tool-0.8 -d all:4
[10:01:52.123] 🅸 interface> Found 0 USB3Vision device (among 10 USB devices)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 127.0.0.1 (127.0.0.1)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.1.135 (192.168.1.255)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 10.211.55.2 (10.211.55.255)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 10.37.129.2 (10.37.129.255)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.2.1 (192.168.2.255)
[10:01:52.124] 🅸 interface> [GvDiscoverSocket::new] Add interface 192.168.2.16 (192.168.2.255)
[10:01:52.124] 🆆 interface> [ArvGVInterface::send_discover_packet] Error sending packet using local broadcast: Error sending message: Can't assign requested address
[10:01:53.126] 🅳 misc> Regex '^.*$' created from glob '*'
No device found

This is the section from ifconfig for the nic connected to the camera:

en17: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
	options=6467<RXCSUM,TXCSUM,VLAN_MTU,TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
	ether 00:e0:4c:03:22:03 
	inet6 fe80::8bb:882c:74fb:476c%en17 prefixlen 64 secured scopeid 0x19 
	inet 192.168.2.16 netmask 0xffffff00 broadcast 192.168.2.255
	nd6 options=201<PERFORMNUD,DAD>
	media: autoselect (1000baseT <full-duplex>)
	status: active

The camera is connected directly to that NIC port. The NIC is a USB-C -> GigE dongle.

Here is a wireshark capture after arv-tool-0.8 -a 192.168.2.9 -d all:4

arv-tool-a.pcapng.zip

Here is the console response to that command:

arv-tool-a_console.txt.zip

All the traffic was directed through 192.128.2.1. Here is the ifconfig section for 192.128.2.1:

bridge102: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	options=63<RXCSUM,TXCSUM,TSO4,TSO6>
	ether 3a:f9:d3:6a:f3:66 
	inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
	inet6 fe80::cf8:efe7:76eb:407d%bridge102 prefixlen 64 secured scopeid 0x1a 
	Configuration:
		id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
		maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
		root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
		ipfilter disabled flags 0x0
	member: en12 flags=3<LEARNING,DISCOVER>
	        ifmaxaddr 0 port 27 priority 0 path cost 0
	member: en17 flags=3<LEARNING,DISCOVER>
	        ifmaxaddr 0 port 25 priority 0 path cost 0
	nd6 options=201<PERFORMNUD,DAD>
	media: autoselect
	status: active

It is some sort of virtual interface automatically configured by MacOS. I suspect is goal is to enable internet sharing across the NICs. It seems to be required to enable DHCP to serve the directly attached camera.

  1. I am not claiming the camera is spec compliant or production ready. It is in prototype testing. It looks like we need to provide ack pending responses for the memory reads. I do think it will respond correctly to a discover command.

  2. With my golang test I look for NIC en17, acquire its address, and send the discovery broadcast from there. That does work. The request packet is from and the response packet is to 192.168.2.16.

Thank you for entertaining this issue.

bvwj avatar Oct 28 '21 15:10 bvwj

Hi @EmmanuelP , I think I have an approach that would fix this issue for most cases.

At the moment the source of the interface IP comes from calling arv_gv_interface_camera_locate (https://github.com/AravisProject/aravis/blob/main/src/arvgvinterface.c#L535C1-L535C31) This function seems to have 2 phases: - firstly, it compares the masked IP value of each interface with the masked IP value of the camera, if this condition is met, it uses said IP - if the last step didn’t work, it tries to send a basic control packet to the camera over each interface, it uses the IP of the interface that got the answer back. This is the part were the wrong IP is obtained, not sure why.

My suggestion is that before the second phase, we could try sending the control packet only to the interface associated to the default gateway, if that doesn't answer, then do the second phase with all the interfaces. Or alternatively, we can have an option that determines if the second phase uses all the interfaces or just the primary one.

I think it is reasonable to prioritise that interface and it fixes the problem for us, though I couldn't figure out exactly why the second phase can sometimes select the wrong interface.

Please let me know your thoughts.

EDIT: I figured out the reason! the interface tested is a bridge created by a kubernetes network plugin called weave. Sending anything over that bridge ends up being masqueraded (or source NATed if you like) and sent via the primary interface, the response is redirected back. I still think my proposal is a good approach to deal with this.

EmilioPeJu avatar Nov 23 '23 19:11 EmilioPeJu

Thanks @EmilioPeJu.

@EmmanuelP would you accept a PR that implements Emilio's suggestion?

The issue is simply that if there is an additional interface that can also contact the camera via a NAT then then there is a chance that the library will bind to that for the GVSP stream. But NATs will not allow new connections from outside.

This sounds unusual but it is normally the case when running inside a container or even running natively on a server that has kubernetes components installed.

Emilios solution only changes the order of the choosing the interfaces such that the default one is first - this should not affect any other users.

gilesknap avatar Nov 29 '23 19:11 gilesknap

@EmmanuelP would you accept a PR that implements Emilio's suggestion?

I don't grasp the full implications of the proposal. A pull request would at least help me to test it. So please go ahead.

EmmanuelP avatar Nov 29 '23 22:11 EmmanuelP

Thanks!

gilesknap avatar Nov 30 '23 13:11 gilesknap