Windows-Containers icon indicating copy to clipboard operation
Windows-Containers copied to clipboard

Azure VFP Switch Extension differs between SKUs

Open hach-que opened this issue 5 months ago • 15 comments

Is your feature request related to a problem? Please describe. I am attempting to run Calico on Windows 11 machines and I'm running into an issue where network connectivity works for pods on Windows Server 2025 but not Windows 11. Having inspected the virtual switch state via vfpctrl, the only real difference seems to be that it is installed as a filtering extension instead of a forwarding extension on client SKUs.

The Switch extension in the registry indicates different INFs and names, even for the same GUID:

(Server 2025)

Image

(Windows 11)

Image

Now I'm not certain this is the root cause, but everything else seems to be identical. On the Windows 11 machine, vfpctrl reports that the rules/layers of the port are dropping packets due to "no match" and I suspect that the filtering extension is lacking some functionality that Calico expects to have working. Forwarding extensions seem to be able to do a lot more, so this is my best guess as to the root cause.

Describe the solution you'd like Please provide a way for the VFP extension to be switched from "filtering" to "forwarding" mode on client SKUs, so that we can use Kubernetes to orchestrate build/test jobs.

Describe alternatives you've considered I'm currently trying to track down the source of vfpfilter.ini and vfpext.inf to see if it's possible to directly install the forwarding extension on a client SKU, but so far I haven't located the files.

hach-que avatar Aug 05 '25 11:08 hach-que

Thank you for creating an Issue. Please note that GitHub is not an official channel for Microsoft support requests. To create an official support request, please open a ticket here. Microsoft and the GitHub Community strive to provide a best effort in answering questions and supporting Issues on GitHub.

github-actions[bot] avatar Aug 05 '25 11:08 github-actions[bot]

Ok so it looks like the registry key/value that needs changing is:

  • Key: Computer\HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Kernel
  • Value: FilterClass
  • Old Data: ms_switch_filter
  • New Data: ms_switch_forward

Unfortunately Computer\HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\NetworkSetup2 is protected by TrustedInstaller, so even elevating to SYSTEM is not enough to change this registry value.

This PowerShell script seems to get to a TrustedInstaller command prompt, from which regedit can be launched to change the value:

$ConfirmPreference = "None"
$isAdmin = ([Security.Principal.WindowsPrincipal] [Security.Principal.WindowsIdentity]::GetCurrent()).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)
if (-not $isAdmin) {
    Start-Process powershell -ArgumentList "-NoProfile -File `"$PSCommandPath`"" -Verb RunAs
    exit
}
Set-ExecutionPolicy -ExecutionPolicy bypass
Install-Module -Name NtObjectManager
Start-Service -Name TrustedInstaller
$parent = Get-NtProcess -ServiceName TrustedInstaller
$proc = New-Win32Process cmd.exe -CreationFlags NewConsole -ParentProcess $parent
$ConfirmPreference = "High"

hach-que avatar Aug 05 '25 13:08 hach-que

Hmm, seems like that wasn't enough:

PS C:\Users\Build> vfpctrl /port 0FF64E2F-491F-4373-98E3-ECC8A16ED0CA /get-switch-info

 ITEM LIST
===========

  SWITCH INFORMATION
    VFP is running as a filtering extension
    SRIOV is NOT supported by external NIC
    GFT is NOT supported by external NIC
    QoS Tx caps offload is NOT supported by external NIC
    QoS Tx reservations offload is NOT supported by external NIC
    QoS Tx caps offload is NOT in use
    QoS Tx reservations offload is NOT in use
    VFP does NOT allow GFT feature
    GFT is NOT enabled on the external NIC
    Number of current VFs: 0
    Number of pending direct config requests: 0

hach-que avatar Aug 05 '25 13:08 hach-que

Changing the filter class to ms_switch_filter on Server 2025 doesn't seem to have any effect either, so I don't think that FilterClass registry value is at all used:

 ITEM LIST
===========

  SWITCH INFORMATION
    VFP is running as a forwarding extension
    SRIOV is NOT supported by external NIC
    GFT is NOT supported by external NIC
    QoS Tx caps offload is NOT supported by external NIC
    QoS Tx reservations offload is NOT supported by external NIC
    QoS Tx caps offload is NOT in use
    QoS Tx reservations offload is NOT in use
    VFP does NOT allow GFT feature
    GFT is NOT enabled on the external NIC
    Number of current VFs: 0
    Number of pending direct config requests: 0

hach-que avatar Aug 05 '25 14:08 hach-que

Huzzah! Importing the whole registry key from Windows Server 2025 across to Windows 11 gets it to register as a forwarding extension correctly (see below).

The following registry keys are different between Server 2025 and 11, not just FilterClass:

  • {F74F241B-440F-4433-BB28-00F89EAD20D8}\FilterClass
  • {F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f0-5923-47c0-9a68-d0bafb577901}\0006\@
  • {F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f0-5923-47c0-9a68-d0bafb577901}\0008\@
  • {F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f7-5923-47c0-9a68-d0bafb577901}\0004\@
  • {F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f200-5923-47c0-9a68-d0bafb577901}\0002\@

I'm guessing the FilterClass value gets embedded in the hexadecimal values of those registry keys, but as to how those registry keys are parsed I have no idea. What I do know is that after applying the registry values below, containers on Windows 11 nodes work again (in the list below mobius is a Windows 11 node with the registry keys applied, win-nsmfqr5v4ft is a Windows Server 2025 install, and the other nodes are Windows 11 without the registry keys applied):

connectivity-windows-curl-8b7kw              1/1     Running            0                    5h23m   10.244.54.161    hulk              <none>           <none>
connectivity-windows-curl-bb6cw              1/1     Running            32 (3m6s ago)        5h23m   10.244.134.124   mobius            <none>           <none>
connectivity-windows-curl-gjlpm              0/1     Running            49 (137m ago)        5h23m   10.244.16.220    hawkeye           <none>           <none>
connectivity-windows-curl-qlnrp              1/1     Running            4 (<invalid> ago)    5h23m   10.244.146.168   win-nsmfqr5v4ft   <none>           <none>
connectivity-windows-curl-xfdp7              1/1     Running            0                    5h23m   10.244.22.231    ms-marvel         <none>           <none>
connectivity-windows-httpget-5nnh7           0/1     CrashLoopBackOff   23 (137m ago)        3h35m   10.244.22.235    ms-marvel         <none>           <none>
connectivity-windows-httpget-h8htm           0/1     CrashLoopBackOff   23 (137m ago)        3h35m   10.244.16.222    hawkeye           <none>           <none>
connectivity-windows-httpget-j7p2d           0/1     CrashLoopBackOff   23 (137m ago)        3h35m   10.244.54.165    hulk              <none>           <none>
connectivity-windows-httpget-kc7jr           1/1     Running            63 (3m6s ago)        3h29m   10.244.134.125   mobius            <none>           <none>
connectivity-windows-httpget-mhlcd           1/1     Running            60 (<invalid> ago)   3h35m   10.244.146.167   win-nsmfqr5v4ft   <none>           <none>
connectivity-windows-nslookup-2jhx7          0/1     CrashLoopBackOff   26 (141m ago)        4h8m    10.244.16.221    hawkeye           <none>           <none>
connectivity-windows-nslookup-fbl5q          0/1     CrashLoopBackOff   26 (140m ago)        4h8m    10.244.22.234    ms-marvel         <none>           <none>
connectivity-windows-nslookup-flbd8          1/1     Running            65 (3m6s ago)        4h8m    10.244.134.123   mobius            <none>           <none>
connectivity-windows-nslookup-sz6dj          1/1     Running            54                   4h8m    10.244.146.166   win-nsmfqr5v4ft   <none>           <none>
connectivity-windows-nslookup-x7sb5          0/1     CrashLoopBackOff   26 (140m ago)        4h8m    10.244.54.164    hulk              <none>           <none>

Here's the registry file if anyone else needs to import it into the registry via regedit.exe (you do need to get regedit.exe running as TrustedInstaller as per the previous instructions):

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}]

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Kernel]
"Optional"=dword:00000001
"FilterClass"="ms_switch_forward"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties]

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1ef-5923-47c0-9a68-d0bafb577901}]

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1ef-5923-47c0-9a68-d0bafb577901}\0014]
@=hex(ffff0007):01,00,00,00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f0-5923-47c0-9a68-d0bafb577901}]

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f0-5923-47c0-9a68-d0bafb577901}\0002]
@=hex(ffff0012):6d,00,73,00,5f,00,77,00,69,00,6e,00,76,00,66,00,70,00,00,00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f0-5923-47c0-9a68-d0bafb577901}\0006]
@=hex(ffff0012):4d,00,69,00,63,00,72,00,6f,00,73,00,6f,00,66,00,74,00,20,00,41,\
  00,7a,00,75,00,72,00,65,00,20,00,56,00,46,00,50,00,20,00,53,00,77,00,69,00,\
  74,00,63,00,68,00,20,00,45,00,78,00,74,00,65,00,6e,00,73,00,69,00,6f,00,6e,\
  00,00,00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f0-5923-47c0-9a68-d0bafb577901}\0008]
@=hex(ffff0012):4d,00,69,00,63,00,72,00,6f,00,73,00,6f,00,66,00,74,00,20,00,41,\
  00,7a,00,75,00,72,00,65,00,20,00,56,00,46,00,50,00,20,00,53,00,77,00,69,00,\
  74,00,63,00,68,00,20,00,45,00,78,00,74,00,65,00,6e,00,73,00,69,00,6f,00,6e,\
  00,00,00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f0-5923-47c0-9a68-d0bafb577901}\0014]
@=hex(ffff0011):01

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f0-5923-47c0-9a68-d0bafb577901}\0016]
@=hex(ffff0011):01

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f0-5923-47c0-9a68-d0bafb577901}\005a]
@=hex(ffff0011):01

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f1-5923-47c0-9a68-d0bafb577901}]

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f1-5923-47c0-9a68-d0bafb577901}\0006]
@=hex(ffff2012):76,00,6d,00,6e,00,65,00,74,00,65,00,78,00,74,00,65,00,6e,00,73,\
  00,69,00,6f,00,6e,00,00,00,00,00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f7-5923-47c0-9a68-d0bafb577901}]

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f7-5923-47c0-9a68-d0bafb577901}\0002]
@=hex(ffff000d):1b,24,4f,f7,0f,44,33,44,bb,28,00,f8,9e,ad,20,d8

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f7-5923-47c0-9a68-d0bafb577901}\0004]
@=hex(ffff0012):6d,00,73,00,5f,00,73,00,77,00,69,00,74,00,63,00,68,00,5f,00,66,\
  00,6f,00,72,00,77,00,61,00,72,00,64,00,00,00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f1f7-5923-47c0-9a68-d0bafb577901}\0006]
@=hex(ffff0011):01

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f200-5923-47c0-9a68-d0bafb577901}]

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f200-5923-47c0-9a68-d0bafb577901}\0002]
@=hex(ffff0012):76,00,66,00,70,00,65,00,78,00,74,00,2e,00,69,00,6e,00,66,00,00,\
  00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f200-5923-47c0-9a68-d0bafb577901}\0004]
@=hex(ffff0012):49,00,6e,00,73,00,74,00,61,00,6c,00,6c,00,00,00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkSetup2\Filters\{F74F241B-440F-4433-BB28-00F89EAD20D8}\Properties\{a111f200-5923-47c0-9a68-d0bafb577901}\0028]
@=hex(ffff1003):dd,07,0c,00,04,00,05,00,00,00,00,00,00,00,00,00

hach-que avatar Aug 05 '25 14:08 hach-que

All nodes now pass connectivity tests once the registry keys are applied to those machines as well (and they're restarted):

Image

So yeah, ideally we would have an easier method of switching this driver from filter to forward mode than applying a bunch of registry keys through TrustedInstaller.

hach-que avatar Aug 05 '25 14:08 hach-que

Interesting. I'm wondering what is your set up for windows server 2025? how did you set up the calico, is it k8s related or not? thanks!

zylxjtu avatar Aug 06 '25 17:08 zylxjtu

@zylxjtu I'm using UET/RKM to deploy clusters https://github.com/RedpointGames/uet/wiki/Deploying-Kubernetes-clusters-for-build-automation though I should note this is still experimental while I fix up networking issues on Windows.

Unfortunately after my last comment, I've noticed Windows Containers losing network connectivity after the node has been running for a while. I suspect it's caused by Calico's TCP reset bug, so I am currently experimenting with replacing Calico with Flannel to mitigate it, based on what RKE2 does.

hach-que avatar Aug 06 '25 18:08 hach-que

Yep, everything is working perfectly fine on Windows 11 clients with the above registry patch and using Flannel instead of Calico (UET/RKM has now been updated to deploy Flannel instead of Calico):

Image

hach-que avatar Aug 07 '25 09:08 hach-que

Which version of Windows 11 are you on?

ntrappe-msft avatar Aug 08 '25 22:08 ntrappe-msft

Which version of Windows 11 are you on?

24H2, Build 26100.4652

hach-que avatar Aug 09 '25 17:08 hach-que

For other people running into this issue, UET/RKM now has a uet cluster switch-vfp-mode command which can be used to switch back and forth between filtering and forwarding modes as an Administrator:

uet cluster switch-vfp-mode --filter
uet cluster switch-vfp-mode --forward

RKM will also automatically switch the VFP mode and reboot the machine when it starts up if it's not currently in forwarding mode.

hach-que avatar Aug 10 '25 09:08 hach-que

For other people running into this issue, UET/RKM now has a uet cluster switch-vfp-mode command which can be used to switch back and forth between filtering and forwarding modes as an Administrator:

uet cluster switch-vfp-mode --filter
uet cluster switch-vfp-mode --forward

RKM will also automatically switch the VFP mode and reboot the machine when it starts up if it's not currently in forwarding mode.

Is this a good workaround for you? Or are you still not seeing the behavior you'd expect?

ntrappe-msft avatar Aug 18 '25 14:08 ntrappe-msft

@ntrappe-msft This workaround is good enough for me for now; I've incorporated the behaviour into the Kubernetes manager I use so I'm no longer impacted by this issue.

That said, this workaround does require writing registry keys as TrustedInstaller and perhaps doing things in a way that might break with future Windows updates, so an official solution would be preferred long term.

hach-que avatar Aug 19 '25 07:08 hach-que

Tracking this internally: 58993043.

ntrappe-msft avatar Aug 27 '25 15:08 ntrappe-msft

This issue has been open for 90 days with no updates. no assignees, please provide an update or close this issue.

@ntrappe-msft I suspect KB5072033 breaks pod reachability via services from other machines on Windows 11. Our pod connectivity broke on all machines in the last few days, which is around when that KB was installed.

Once that KB is installed, it's no longer possible for an external machine to connect to a pod over TCP ports that have been exposed as NodePort. e.g. it's no longer possible to connect on port 30529 to this service, and the machine attempting the connection gets "ConnectionRefused".

> kubectl --context=falcon get services -o wide
NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                         AGE      SELECTOR
hawkeye-1            NodePort    10.104.159.167   <none>        7001:31311/TCP,7002:30843/TCP,45600:30529/TCP   11m      uba-cloud-id=b58fa41673dc33a969e35e3c47849bd6

I suspect it's related to this networking changes:

[Networking] Fixed: This update fixes an issue where external virtual switches lose their physical network adapter (NIC) bindings after a host restart. When this happens, the switches revert to internal mode, resulting in loss of network connectivity for virtual machines and blocking normal server operations. ​​​​​​​

I've tried uninstalling the KB and haven't had any luck getting networking working again. I'm going to try a network reset in case that helps. The KB is the only thing I can think of that's causing this since we haven't changed anything else on these machines.

hach-que avatar Dec 11 '25 03:12 hach-que