weave icon indicating copy to clipboard operation
weave copied to clipboard

optimise IP multicast

Open rade opened this issue 10 years ago • 12 comments

Currently weave implements IP multicast in terms of broadcast, i.e. multicast packets are always sent to all peers. That is sub-optimal.

Weave does observe IGMP and hence could build up knowledge about which peers contain receivers for specific multicast groups, and then use that knowledge to route packets to just those peers.

rade avatar Nov 12 '14 07:11 rade

I think you would need some kind of distributed database for doing IGMP snooping. Weave would need to detect IGMP join/leave requests and broadcast this information by updating the list of peers subscribed to each multicast group in the database (note that simple gossip updates would not be enough in this case as order is important). Then Weave would then detect all multicast traffic and translate it to unicast...

inercia avatar Nov 17 '14 17:11 inercia

@inercia

order is important

how do ordinary igmp-aware routers handle that?

rade avatar Nov 17 '14 18:11 rade

I don't know the particular details, but I'd guess that routers detect join/leave messages in ports, serialize that information for updating some kind of multicast forwarding table, and use that table for controlling what multicast traffic is forwarded. I think a distributed router would need to do that serialization in the join/leave information detected in the virtual ports...

inercia avatar Nov 17 '14 20:11 inercia

@inercia :+1:

greenpau avatar May 28 '15 15:05 greenpau

Do you have any update on getting the multicast optimized ?

erandu avatar Dec 09 '16 13:12 erandu

@erandu no, this has not changed

bboreham avatar Dec 09 '16 13:12 bboreham

Currently weave implements IP multicast in terms of broadcast, i.e. multicast packets are always sent to all peers.

I spent some time recently understanding how multicast works in Weave. Elaborating this a bit to give perspective on the problem if some one is interested. For e.g. lets take a cluster with 5 nodes running application container accessing 224.1.2.3. L3 multicast traffic from the container is mapped to L2 multicast and sent out. In this case 224.1.2.3 maps to L2 multicast IP 01:00:5e:01:02:03.

L2 multicast traffic from the container through weave bridge, and veth pairs vethwe-bridge <-> vethwe-datapath reach the OVS datapath with below ports

root@ip-172-20-62-75:/home/admin# ovs-dpctl show
system@datapath:
    lookups: hit:270 missed:269 lost:3
    flows: 15
    masks: hit:646 total:2 hit/pkt:1.20
    port 0: datapath (internal)
    port 1: vethwe-datapath
    port 2: vxlan-6784 (vxlan: df_default=false, ttl=0)

Weave programs OVS datapath datapath with below rule on the node 172.20.39.2 (with peers being 172.20.68.242, 172.20.73.60, 172.20.83.7 and 172.20.62.75). Which will basically result in packet getting broadcasted to all the peers irrespective of fact that there is any container interested in receiving packets to multicast IP 224.1.2.3

in_port(1),eth(src=0a:78:b1:2c:11:ae,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:103, bytes:12669, used:0.204s, actions:set(tunnel(tun_id=0x51b960,src=172.20.39.2,dst=172.20.68.242,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x5f8960,src=172.20.39.2,dst=172.20.83.7,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x732960,src=172.20.39.2,dst=172.20.56.145,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x5ae960,src=172.20.39.2,dst=172.20.62.75,tos=0x0,ttl=64,flags(df,key))),2,set(tunnel(tun_id=0x44b960,src=172.20.39.2,dst=172.20.73.60,tos=0x0,ttl=64,flags(df,key))),2,0

Similarly each peer is configured to receive packets to 224.1.2.3 irrespective of any local container interested in receiving the traffic.

tunnel(tun_id=0x96051b/0xffffffffffffffff,src=172.20.68.242/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=62:ae:1b:34:4e:76,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:102, bytes:12546, used:0.552s, actions:1,0
tunnel(tun_id=0x96044b/0xffffffffffffffff,src=172.20.73.60/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=ea:f2:dd:f4:f2:c1,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:103, bytes:12669, used:0.016s, actions:1,0
tunnel(tun_id=0x9605f8/0xffffffffffffffff,src=172.20.83.7/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=a2:a9:d0:e9:99:c0,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:103, bytes:12669, used:0.116s, actions:1,0
tunnel(tun_id=0x9605ae/0xffffffffffffffff,src=172.20.62.75/255.255.255.255,dst=172.20.39.2/255.255.255.255,tos=0/0,ttl=0/0,flags(key)),in_port(2),eth(src=1a:1f:83:fa:c1:fb,dst=01:00:5e:01:02:03),eth_type(0/0x0000), packets:106, bytes:13038, used:0.245s, actions:1,0

This is sub-optimal as reported in this issue. I will share the notes on possible solution.

murali-reddy avatar Aug 06 '18 06:08 murali-reddy

As noted IGMP is one of the standards based solution that can leveraged to optimise the multicast traffic. From the IGMP network topology perspective, container running the multicast application forms the hosts, and Weave running on the node takes the responsibility of switch doing IGMP snooping and should implement the host-router interactions as noted in RFC 2326 which basically covers below scenarios

  • host joining multicast group
  • host leaving multicast gorup
  • router proactively sends IGMP queries to gather multicast group membership.

It should be easy to incorporate semantics of IGMP protocol in the Weave and based on the membership populate the OVS datapath to broadcast only to the intended group members.

murali-reddy avatar Aug 06 '18 08:08 murali-reddy

It should be easy to incorporate semantics of IGMP protocol in the Weave and based on the membership populate the OVS datapath to broadcast only to the intended group members.

@murali-reddy , having more specific rules may be disadvantageous, because rule processing within OVS consumes resources, requires careful table design, etc. Having a single generic rule speeds things up in terms on flow processing. It is a trade-off.

greenpau avatar Aug 06 '18 13:08 greenpau

Out of curiosity, is weave still implements IP multicast using broadcast? Thanks

ceclinux avatar May 06 '21 03:05 ceclinux

The implementation has not changed, however I don’t think the original wording conveys the correct idea.

If you have a cluster of 3 machines and are running 10 containers that receive multicast, Weave Net will do 2 unicast sends to convey the packets machine-to-machine, then inject the packets as multicast to be received by the 10 containers.

Matthias meant that the machine-to-machine part always reaches all machines, not that it is literally implemented using broadcast.

bboreham avatar May 06 '21 06:05 bboreham

image

So like this?

RobKoerts avatar Jun 25 '21 11:06 RobKoerts