WSL icon indicating copy to clipboard operation
WSL copied to clipboard

[WSL2] pre-release NAT issue with apt / ipv6

Open surfaceowl opened this issue 8 months ago • 2 comments

Windows Version

Microsoft Windows [Version 10.0.22631.5262]

WSL Version

2.5.7.0

Are you using WSL 1 or WSL 2?

  • [x] WSL 2
  • [ ] WSL 1

Kernel Version

6.6.87.1-1

Distro Version

Ubuntu 22.04

Other Software

Windows Terminal - Version: 1.22.11141.0

Repro Steps

Windows / WSL versions

Component Build
Windows 10.0.22631.5262 (23H2 Pro)
WSL 2.5.7.0
Kernel 6.6.87.1-1 microsoft-standard-WSL2
WSLg 1.0.66
MSRDC 1.2.6074
Distro Ubuntu 22.04 (jammy)

Issue summary

After upgrading from the stable 5.15 kernel to the current 6.6 preview
(wsl --update --pre-release) I experience intermittent TCP loss inside the VM:

  • First run after a cold Windows bootsudo apt update hangs at
    Connecting to archive.ubuntu.com … while ping archive.ubuntu.com succeeds.
  • wsl --shutdown temporarily fixes it, but the hang returns 5–10 minutes later.
  • dmesg shows repeating
    WSL (…) ERROR: CheckConnection: getaddrinfo() failed: -5 and
    connect() failed: 101 immediately before each freeze.

It looks like the 6.6 guest boots much faster than 5.15 and is up before Windows NAT/DNS is ready; the IPv6 route also drops either right way or after some time killing long-running HTTPS flows (apt, Chrome).

Also - this suggested script from the Microsoft GitHub repo should be signed and available somewhere safe, to make collection of troubleshooting info easier and not another side project (https://github.com/Microsoft/WSL/blob/master/diagnostics/collect-networking-logs.ps1)

Steps to reproduce

  1. Fresh Ubuntu 22.04 on kernel 5.15 → no issue.
  2. wsl --update --pre-release (installs 6.6.87.1-1).
  3. Reboot Windows.
  4. Run sudo apt update → may hang; if not, wait ~10 min and run again → hang.
  5. wsl --shutdown → open WSL → apt works briefly, then the cycle repeats.

Attachments

  • wsl_logs_run_01.apt_NAT_fails.zip – collected during a freeze
    (collect-networking-logs.ps1 script).
  • wsl_logs_run_02.apt_NAT_working.zip – collected after forcing IPv4 and restarting (no freeze).

Expected behaviour

sudo apt update finishes in a few seconds and stays reliable, exactly as on kernel 5.15.

Actual behaviour

apt (and other HTTPS traffic) stalls; ICMP pings keep working. A full
wsl --shutdown or Windows reboot restores connectivity only temporarily.


Diagnostics

Probe Command / key output Interpretation
Guest off-loads sudo ethtool -K eth0 gro off gso off (no error) → freeze still occurs Not a GRO/GSO issue.
Half-open TCP ss -tn state syn-sentempty during freeze Freeze happens before TCP establishes.
Driver / MTU errors `dmesg grep -iE 'hv_netvsc.*offload
NAT chain crash dmesg entry: GnsEngine … nft -a list chain ip nat WSLPOSTROUTING killed by signal 13 Windows connectivity checker resets NAT just before apt stalls.

Full dmesg excerpt (first freeze):

WSL (217) ERROR: CheckConnection: getaddrinfo() failed: -5
WSL (217) ERROR: CheckConnection: connect() failed: 101
… 
WSL (217 - GnsEngine) ERROR: nft -a list chain ip nat WSLPOSTROUTING killed by signal 13

### Work-arounds tested

| Work-around | Outcome |
|-------------|---------|
| **Startup guard** – delay login until a default route *and* DNS work (script below) | Removes the first-boot hang, but later freezes still occur. |
| Disable guest off-loads:<br>`sudo ethtool -K eth0 gro off gso off` | No improvement ⇒ not a GRO/GSO issue. |
| Force apt to IPv4:<br>`Acquire::ForceIPv4 "true"` | **Completely stable for 3 days** ⇒ problem lies on the IPv6/NAT path. |
| Raise IPv4 precedence in `/etc/gai.conf`:<br>`precedence ::ffff:0:0/96  100` | Same positive effect as forcing IPv4. |
| Mask Microsoft connectivity probe:<br>`systemctl mask --now gnsd.service gnsd.socket` | Reduces freeze frequency but does not eliminate it. |

#### Startup guard used for the first-boot fix  

```bash
# ~/.wsl-startup.sh — launched via /etc/wsl.conf
[[ "${BASH_SOURCE[0]}" != "$0" ]] && return      # no effect if sourced
(
  set -Eeuo pipefail
  end=$((SECONDS + 10))                          # wait up to 10 s
  until ip route | grep -q '^default' \
        && getent hosts ubuntu.com >/dev/null 2>&1; do
      (( SECONDS < end )) || exit 70             # give up after 10 s
      sleep 0.5
  done
)

Additional diagnostics collected

Probe Command & key output Interpretation
Guest off-loads toggled sudo ethtool -K eth0 gro off gso off – no error Off-loads disabled yet freeze persists → not an off-load bug.
Half-open TCP check ss -tn state syn-sentempty during freeze No SYNs stuck; stall occurs before TCP establishes.
Driver / MTU errors `dmesg grep -iE 'hv_netvsc.*offload
NAT chain crash dmesg shows GnsEngine … nft -a list chain ip nat WSLPOSTROUTING killed by signal 13 Windows connectivity probe resets NAT just before each stall.

Notes & hypotheses

  • Hang appears only after kernel upgrade - no issues with 5.15 - same hardware
  • Hang may happen if IPv6 is preferred; forcing IPv4 seems to fix it.
  • No guest-side MTU or off-load errors are logged.
  • Freezes coincide with GnsEngine recreating the NAT chain, suggesting a race between the faster-booting 6.6 guest and Windows’ NAT/DNS stack.
  • Troubleshooting steps and wording refined with assistance from ChatGPT (OpenAI) - I am not a NAT, ipv6 expert - but some of the workarounds did definitely make a difference.

Thanks in advance for looking at this - please let me know what further traces or kernel flags would help.

wsl_logs_run_01.apt_NAT_fails.WslNetworkingLogs-2025-05-09_09-06-08.zip

wsl_logs_run_02.apt_NAT_working.WslNetworkingLogs-2025-05-09_09-09-17.zip

Expected Behavior

Using WSL2 with kernel 6.6+, both apt and ping can be run immediately with correct results, and can also be run at any later time with correct results. Specific expectations for each command are:
.

sudo apt update produces results like this immediately:

Hit:1 https://download.docker.com/linux/ubuntu jammy InRelease
Hit:2 https://apt.llvm.org/jammy llvm-toolchain-jammy-18 InRelease
Hit:3 https://packages.redis.io/deb jammy InRelease
...
Fetched 384 kB in 2s (202 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.

ping cnn.com immediately produces results like:

PING cnn.com (151.101.3.5) 56(84) bytes of data.
64 bytes from 151.101.3.5 (151.101.3.5): icmp_seq=1 ttl=51 time=4.19 ms
64 bytes from 151.101.3.5 (151.101.3.5): icmp_seq=2 ttl=51 time=4.19 ms
^C
--- cnn.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 4.186/4.187/4.189/0.001 ms

Actual Behavior

apt prints one blank line to the terminal and then times out - requiring user to control-c to get back to terminal prompt. ping times out

Diagnostic Logs

No response

surfaceowl avatar May 09 '25 18:05 surfaceowl

Diagnostic information
Multiple log files found, using: https://github.com/user-attachments/files/20126912/wsl_logs_run_01.apt_NAT_fails.WslNetworkingLogs-2025-05-09_09-06-08.zip
Detected appx version: 2.5.7.0
optional-components.txt not found

github-actions[bot] avatar May 09 '25 18:05 github-actions[bot]

I am also facing the same issue

EnesArican avatar Jun 09 '25 12:06 EnesArican

Same issue as well. With IPV4 working, I have all of my needs met, but it is annoying to keep getting error messages every time I boot or open VS Code.

AjayChambers avatar Aug 03 '25 22:08 AjayChambers