cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

[Draft] KVM: enable no-mac-spoofing on virtual nics

Open weizhouapache opened this issue 1 year ago • 18 comments

Description

In shared, isolated networks and vpcs in advanced zone, user vms can easily perform ip/arp/mac spoofing by pretending to be another vm in the same network.

  • change mac
ip link set dev eth0 down
ip link set dev eth0 address <MAC of another vm>
ip link set dev eth0 up
  • change IP
ip addr add dev eth0 <IP/cidr of another vm in same network>
ip addr del dev eth0 <old IP/cidr>
  • these can also be done by netplan/network-scripts configurations.

libvirt has a network traffic filtering subsystem which can be used to prevent spoofing. (https://libvirt.org/formatnwfilter.html#concepts)

  • It provides some pre-existing network filters (https://libvirt.org/formatnwfilter.html#pre-existing-network-filters), no-mac-spoofing is missed in the table.
  • Ideally we should use clean-traffic, however, the IP/ARP anti-spoofing does not work in our testing, as the IP is not specified in the libvirt vm definition XML by cloudstack.
  • libvirt can auto-detect the IP but the auto-detected IP is not always correct, for example, when user configures (wrong) static IP inside the user VM.

This PR adds no-mac-spoofing for each nic to prevent mac spoofing.

It could be an improvement PR to support all MAC/IP/ARP spoofing

  • it might lead to some overhead.
  • it is better to be configurable for guest network, network offering or account
  • IP/network information needs to be added to the InterfaceDef for each type of interface (bridge, direct, network, vhostuser, etc).
  • it might be not needed in advanced zone with security groups, as there are already iptables/ebtables/ipset rules to prevent the spoofings
  • it needs some discussion and testing. clean-traffic is good, but it might not be what we want. we need to evaluate the pre-existing network filters and probably consider creating customized filters. refer to https://libvirt.org/firewall.html#the-network-filter-driver

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [ ] Major
  • [ ] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [ ] Minor
  • [ ] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

weizhouapache avatar Apr 19 '24 17:04 weizhouapache

@NuxRo this is probably you want to have. I am not sure if there are downside .

@DaanHoogland and me have tested it. mac anti-spoofing works, but ip anti-spoofing does not work.

weizhouapache avatar Apr 19 '24 18:04 weizhouapache

@blueorangutan package

weizhouapache avatar Apr 19 '24 18:04 weizhouapache

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Apr 19 '24 18:04 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9339

blueorangutan avatar Apr 19 '24 19:04 blueorangutan

@blueorangutan test rocky8 kvm-rocky8

weizhouapache avatar Apr 19 '24 19:04 weizhouapache

@weizhouapache a [SL] Trillian-Jenkins test job (rocky8 mgmt + kvm-rocky8) has been kicked to run smoke tests

blueorangutan avatar Apr 19 '24 19:04 blueorangutan

Hey @weizhouapache, can you provide more context to the issue fixed by this PR? Should we add a no arp spoofing as well?

BryanMLima avatar Apr 19 '24 21:04 BryanMLima

Hey @weizhouapache, can you provide more context to the issue fixed by this PR? Should we add a no arp spoofing as well?

@BryanMLima Updated the PR description

weizhouapache avatar Apr 20 '24 07:04 weizhouapache

[SF] Trillian test result (tid-9937) Environment: kvm-rocky8 (x2), Advanced Networking with Mgmt server r8 Total time taken: 52334 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8951-t9937-kvm-rocky8.zip Smoke tests completed. 127 look OK, 2 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 314.34 test_events_resource.py
test_01_events_resource Error 314.35 test_events_resource.py
test_04_deploy_vm_for_other_user_and_test_vm_operations Failure 100.58 test_network_permissions.py
ContextSuite context=TestNetworkPermissions>:teardown Error 1.52 test_network_permissions.py

blueorangutan avatar Apr 20 '24 12:04 blueorangutan

On KVM hypervisors with Security Groups enabled (Advanced + Shared networking) this is already handled by ebtables in security_group.py

Doing this you would do double packet inspection and there might even be a conflict. Have you looked at this?

My suggestion:

  • Enable this in Libvirt
  • Remove functionality from security_group.py

wido avatar Apr 22 '24 07:04 wido

My suggestion:

* Enable this in Libvirt

* Remove functionality from security_group.py

meaning, @wido , this is good but we need to add a removal of some of the " -j DROP/ACCEPT" lines from the script? sounds like some precision surgery. Do you know which ones to drop? cc @weizhouapache .

DaanHoogland avatar Apr 22 '24 09:04 DaanHoogland

My suggestion:

* Enable this in Libvirt

* Remove functionality from security_group.py

meaning, @wido , this is good but we need to add a removal of some of the " -j DROP/ACCEPT" lines from the script? sounds like some precision surgery. Do you know which ones to drop? cc @weizhouapache .

It would, I think if you take a look it starts here: https://github.com/apache/cloudstack/blob/8ff2c018cc5b3fc69bcd8756695d04b384e46ab8/scripts/vm/network/security_group.py#L280

  • default_ebtables_rules()
  • destroy_ebtables_rules()

Those would no longer be needed

wido avatar Apr 22 '24 09:04 wido

My suggestion:

* Enable this in Libvirt

* Remove functionality from security_group.py

meaning, @wido , this is good but we need to add a removal of some of the " -j DROP/ACCEPT" lines from the script? sounds like some precision surgery. Do you know which ones to drop? cc @weizhouapache .

It would, I think if you take a look it starts here:

https://github.com/apache/cloudstack/blob/8ff2c018cc5b3fc69bcd8756695d04b384e46ab8/scripts/vm/network/security_group.py#L280

  • default_ebtables_rules()
  • destroy_ebtables_rules()

Those would no longer be needed

@wido actually I am thinking of disabling this change for vms with security groups the script security_group.py programs iptables/ebtables rules including the mac/ip/arp anti-spoofing, it has been proved to be working well with both ipv4/ipv6 addresses and one/multiple network nics. this PR only contains no-mac-spoofing which is not good enough to replace the security_group.py. it looks like a precise surgery to remove the ebtables rules, as @DaanHoogland said. we could drop the methods in security_group.py if all mac/ip/arp anti-spoofing are supported (see the PR description).

other than that, the upgrade could be an issue as the VMs started in old versions (before upgrade) do not have the configuration in their VM XML definition.

weizhouapache avatar Apr 22 '24 10:04 weizhouapache

actually I am thinking of disabling this change for vms with security groups

I second that. It will be simpler and the will not cripple the much security groups implementation.

DaanHoogland avatar Apr 22 '24 11:04 DaanHoogland

My suggestion:

* Enable this in Libvirt

* Remove functionality from security_group.py

meaning, @wido , this is good but we need to add a removal of some of the " -j DROP/ACCEPT" lines from the script? sounds like some precision surgery. Do you know which ones to drop? cc @weizhouapache .

It would, I think if you take a look it starts here: https://github.com/apache/cloudstack/blob/8ff2c018cc5b3fc69bcd8756695d04b384e46ab8/scripts/vm/network/security_group.py#L280

  • default_ebtables_rules()
  • destroy_ebtables_rules()

Those would no longer be needed

@wido actually I am thinking of disabling this change for vms with security groups the script security_group.py programs iptables/ebtables rules including the mac/ip/arp anti-spoofing, it has been proved to be working well with both ipv4/ipv6 addresses and one/multiple network nics. this PR only contains no-mac-spoofing which is not good enough to replace the security_group.py. it looks like a precise surgery to remove the ebtables rules, as @DaanHoogland said. we could drop the methods in security_group.py if all mac/ip/arp anti-spoofing are supported (see the PR description).

other than that, the upgrade could be an issue as the VMs started in old versions (before upgrade) do not have the configuration in their VM XML definition.

Sounds good. I would only add this to VMs without any SG. That would get my approval.

wido avatar Apr 22 '24 20:04 wido

[SF] Trillian Build Failed (tid-10024)

blueorangutan avatar Apr 25 '24 12:04 blueorangutan

[SF] Trillian test result (tid-10030) Environment: kvm-ubuntu22 (x2), Advanced Networking with Mgmt server u22 Total time taken: 70862 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8951-t10030-kvm-ubuntu22.zip Smoke tests completed. 126 look OK, 3 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 338.24 test_events_resource.py
test_01_events_resource Error 338.26 test_events_resource.py
test_list_system_vms_metrics_history Failure 0.48 test_metrics_api.py
test_04_deploy_vm_for_other_user_and_test_vm_operations Failure 102.16 test_network_permissions.py
ContextSuite context=TestNetworkPermissions>:teardown Error 1.57 test_network_permissions.py

blueorangutan avatar Apr 26 '24 10:04 blueorangutan

@weizhouapache Good effort.

Like @wido says, the problems this would solve are not an issue in SG zones usually, so indeed we should not apply any of this there.

Otherwise it'd be a nice "win" for operators of regular Advanced Zones to apply anti-spoofing measures. We already have something somewhat similar for VMWare.

I'd be happy to use all reasonable libvirt nwfilter features, make them options in Network Offering:

  • IP anti-spoofing (with or without auto-detect)
  • ARP anti-spoofing
  • MAC anti-spoofing

Would it even be reasonable to allow the operator to specify more nwfilter? Ie load whatever xml file from /usr/share/libvirt/nwfilter/ that they want?

NuxRo avatar May 01 '24 10:05 NuxRo