core icon indicating copy to clipboard operation
core copied to clipboard

Wrong outgoing source IP (0.0.0.0)

Open Monke202 opened this issue 3 years ago • 2 comments

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

  • [x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
  • [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

Describe the bug

Outgoing packets, originating from the firewall itself, on the WAN interface have the ip source address 0.0.0.0 instead of the configured ip address from the corresponding interface.

Last known working version: 22.7.3_2

To Reproduce

Steps to reproduce the behavior:

  1. Upgrade version 22.7 -> 22.7.3
  2. Upgrade version 22.7_4 -> 22.7.3_2
  3. Update to version 22.7.4 fails because the network is not functional anymore

Expected behavior

The firewall can reach the external network.

Describe alternatives you considered

A source NAT rule for 0.0.0.0/32 on the WAN interface solves the problem.

Screenshots

nat_rule

Relevant log files

Error configd.py Timeout (120) executing : firmware remote

Additional context

Environment

Multi-WAN Setup

Firewall Server OPNsense 22.7.4-amd64 FreeBSD 13.1-RELEASE-p2 OpenSSL 1.1.1q 5 Jul 2022 Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz (4 cores, 8 threads) Network Intel® I210 Network Intel® I350

Monke202 avatar Sep 19 '22 10:09 Monke202

best check the local routing table first (netstat -nr)

AdSchellevis avatar Sep 19 '22 11:09 AdSchellevis

Thanks for the reply. We looked into the routing table and couldn't find any suspicious entries. We also compared them with a working opnsense setup on another location (single wan setup).

tcpdump on the wan interface shows that the outgoing packages have the 0.0.0.0 as their source address. When we configure outgoing nat rules for source 0.0.0.0 , tcpdump shows the correct source ip for the wan gateway.

Monke202 avatar Sep 20 '22 10:09 Monke202

We have the same issue on a firewall during upgrade to 22.7.4. Noting that other firewalls with a near identical config in the same organisation completed successfully; the primary difference between the one that has the issue and the others is it has an SSN interface, and as the primary site a more complex set of rules on NAT and on the firewall generally.

Doing tests at firewall CLI with the host command and a specified external nameserver (because the initial symptom was noted as being DNS resolution fails) and watching the firewall logs, you can see all traffic originating from the firewall gets 0.0.0.0 as source IP. Testing with ping to an IP similarly shows the source as 0.0.0.0.

putt1ck avatar Nov 22 '22 07:11 putt1ck

NB suggested workaround of adding NAT rule for 0.0.0.0/32 worked, thanks @Monke202

putt1ck avatar Nov 22 '22 07:11 putt1ck

@putt1ck still unsure where this traffic originates from and why it gets a 0.0.0.0 source address. Do you still have a setup to reproduce? Is "0.0.0.0" found in ifconfig output or in the file /tmp/rules.debug?

Cheers, Franco

fichtner avatar Nov 22 '22 07:11 fichtner

So logged into the firewall over SSH, and running the host command # host -t A google.de 9.9.9.9 ;; connection timed out; no servers could be reached

Viewing the logs via the web UI and you see

  | 0.0.0.0:34147 | 9.9.9.9:53 | udp | let out anything from firewall host itself -- | -- | -- | -- | --

where on an install without the issue the 0.0.0.0 shows as the firewall IP. The workaround is in place on that firewall so assume removing the rule will allow further tests (but would need to wait until out of hours).

ifconfig doesn't show an interface with 0.0.0.0

/tmp/rules.debug has only the rule added as a workaround i.e. nat on em1 inet from 0.0.0.0/32 to any -> (em1:0) port 1024:65535 # Workaround for internal NAT issue

putt1ck avatar Nov 22 '22 07:11 putt1ck

@putt1ck just to be on the safe side can you disable "Use shared forwarding between packet filter, traffic shaper and captive portal" under Firewall: Settings: Advanced and see if the issue persists? If yes it's a routing table issue in FreeBSD 13... netstat -nr4 might reveal something in that case.

fichtner avatar Nov 22 '22 07:11 fichtner

Very old bug report, not sure if still applies (and not resolved) https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=159103

fichtner avatar Nov 22 '22 08:11 fichtner

@fichtner Shared Forwarding was not enabled. I tried enabling it just for fun but either way (without the manual rule workaround) it doesn't resolve the issue. Ref that bug, I can't see a "network_interfaces" conf line that looks like the one described in the workaround.

putt1ck avatar Nov 23 '22 05:11 putt1ck

@putt1ck the bug was for FreeBSD, which uses rc.conf syntax to init devices so that part doesn't apply for us. It suggests a problem with loopback devices. Do you have additional loopback devices configured?

Cheers, Franco

fichtner avatar Nov 23 '22 05:11 fichtner

I completed the upgrade and the issue still exists if you want to try more fixes.

Running upgrade from console reports probably unrelated issue

py37-markupsafe has a missing dependency: python37
py37-markupsafe has a missing dependency: py37-setuptools
py37-markupsafe is missing a required shared library: libpython3.7m.so.1.0

>>> Missing package dependencies were detected.
>>> Found 2 issue(s) in the package database.

pkg-static: No packages available to install matching 'python37' have been found in the repositories
pkg-static: No packages available to install matching 'py37-setuptools' have been found in the repositories
>>> Summary of actions performed:

python37 dependency failed to be fixed
py37-setuptools dependency failed to be fixed


putt1ck avatar Nov 23 '22 05:11 putt1ck

# pkg remove py37-markupsafe

Long unused... introduced by a bug in package manager while renaming the package from mixed case to lower case letters.

fichtner avatar Nov 23 '22 06:11 fichtner

Only lo0 listed; and the same set of interfaces listed on a branch site firewall with identical hardware which updated without this issue arising. The only obvious difference for network interfaces is the SSN interface which does not exist on the branch one.

Looking at NAT configs, NPTv is the same (unused; Outbound is basically the same except the main office (the one with the issue) has more manual rules (larger office, more internal subnets) and some of those rules have specified external addresses (connection has a /29) where the branch only uses "interface address"; One-to-one has an entry in main but none at branch; Port forward at branch has only a few entries (2+anti-lockout) while main has many ~25, including one "loopback" for capturing NTP queries (Android, why does it ignore DHCP conf?):

! LAN address 123 (NTP) firewall internal interface address 123 (NTP)

putt1ck avatar Nov 23 '22 07:11 putt1ck

I had a similar issue today, the only this that fixed it: Reboot of the OPNSense appliance.

Before this happened, I noted strange DNS Issues and started to debug unbound DNS as well as IPv6, which didn't solve my issues. Systems behind the appliance worked well (except DNS, due to side effects of this bug), but everything on the firewall itself used 0.0.0.0 as source address.

encbladexp avatar Dec 08 '22 22:12 encbladexp

What version was the affected firewall running? Did it start on an upgrade or maybe some other recent change?

putt1ck avatar Dec 09 '22 05:12 putt1ck

Currently it is OPNsense 22.7.9_3-amd64, I am unsure if it started on the upgrade, but what I did shortly before:

  • Turn on IPv6 Track Interface for an Additional VLAN, before I changed the Prefix Delegation Size to 60 from 64.
  • Played around with Unbound DNS Settings
  • I did the Upgrade vom OPNsense 22.7.9-amd64 to OPNsense 22.7.9_3-amd64

I am still searching for a correlation. I noted that bug only because unbound DNS didn't work anymore, which has some impact on my network for sure. A quick check on a SSH session showed that all direct packets outgoing from the Appliance itself are using 0.0.0.0 as source IP, all forwarded packets work as expected.

encbladexp avatar Dec 09 '22 11:12 encbladexp

Just had a similar issue, for some reason my routing table changed from :

  Internet:
  Destination        Gateway            Flags     Netif Expire
  default            xxx.xxx.xxx.1        UGS        igc1
  xxx.xxx.xxx.0/24   link#2               U          igc1
  xxx.xxx.xxx.121    link#2               UHS         lo0

to

  Internet:
  Destination        Gateway            Flags     Netif Expire
  default            92.108.79.1        UGS        igc1
  xxx.xxx.xxx.0/24     link#2             U          igc1
  xxx.xxx.xxx.1        link#2             UHS        igc1      <<<---- ??
  xxx.xxx.xxx.121      link#2             UHS         lo0

If anyone has the same 0.0.0.0 outbound issue, it might be worth checking if the gateway isn't configured on a link (which would explain the behaviour, although I don't know where it came from).

AdSchellevis avatar Dec 13 '22 18:12 AdSchellevis

might be https://github.com/opnsense/core/commit/a230326d7fe165e597cd2d5a30b064e0b3a1c58c as well

AdSchellevis avatar Jan 24 '23 09:01 AdSchellevis

Not really, this has been happening since 22.1 (FreeBSD 13).

fichtner avatar Jan 24 '23 10:01 fichtner

I'm just in a teams session with a customer and the same phenomenon, at our site they disabled NAT, we set it to manual without adding any rules and then it replaced 0.0.0.0 to its original WAN address.

Maybe it helps :)

mimugmail avatar Jan 25 '23 10:01 mimugmail

@mimugmail what does the routing table look like? mine missed a link, which is why it looked similar to https://github.com/opnsense/core/commit/a230326d7fe165e597cd2d5a30b064e0b3a1c58c (but maybe something completely different)

AdSchellevis avatar Jan 25 '23 10:01 AdSchellevis

Routing table is ok .. we were able to login via WAN with SSH and UI, only local generated packets (DNS queries) were using 0.0.0.0 as the source. Sadly I already hopped off the session

mimugmail avatar Jan 25 '23 10:01 mimugmail

I have what I think is the same issue. I've been chasing this for a long time, but I'm not a network specialist so was assuming it was just my lack of expertise.

In case it's useful, I can ping using a specified interface (-S) from OPNSense, but a regular ping times out. LAN traffic is routed correctly. It's just local system traffic that doesn't go anywhere.

Something else I observed when the issue started is when a VPN tunnel was up, it would route traffic from clients over the VPN, but "normal" traffic stopped working. Taking the tunnel down again and traffic started flowing again, in case that's related.

I'm running 22.7.4 on ESXi. Happy to do troubleshooting!

Pinoir avatar Jan 29 '23 22:01 Pinoir

The cases I have seen relate to missing link addresses in the routing table, netstat -nr -4 would easily tell you if that's the case. If for some reason the address is removed but not added again, it would explain the behavior (https://github.com/opnsense/core/commit/a230326d7fe165e597cd2d5a30b064e0b3a1c58c caused that, but there might be other reasons like a dhcp client not playing nicely).

AdSchellevis avatar Jan 30 '23 07:01 AdSchellevis

This is what I have. vmx0 is wan, vmvx1 is lan.

Routing tables

Internet: Destination Gateway Flags Netif Expire default 192.168.0.1 UGS vmx0 8.8.4.4 192.168.0.1 UGHS vmx0 8.8.8.8 192.168.0.1 UGHS vmx0 127.0.0.1 link#4 UH lo0 192.168.0.0/24 link#1 U vmx0 192.168.0.1 link#1 UHS vmx0 192.168.0.250 link#1 UHS lo0 192.168.10.0/23 link#2 U vmx1 192.168.11.250 link#2 UHS lo0

Pinoir avatar Jan 30 '23 09:01 Pinoir

what does route show 8.8.8.8 result into?

AdSchellevis avatar Jan 30 '23 09:01 AdSchellevis

route to: dns.google destination: dns.google gateway: 192.168.0.1 fib: 0 interface: vmx0 flags: <UP,GATEWAY,HOST,DONE,STATIC> recvpipe sendpipe ssthresh rtt,msec mtu weight expire 0 0 0 0 1500 1 0

Pinoir avatar Jan 30 '23 12:01 Pinoir

ok, that's good, doesn't look like a routing issue then. Next question is about the nat rules, what do they look like:

grep '^nat on' /tmp/rules.debug

AdSchellevis avatar Jan 30 '23 12:01 AdSchellevis

nat on vmx0 inet from (vmx1:network) to any port 500 -> (vmx0:0) static-port # Automatic outbound rule nat on vmx0 inet from (lo0:network) to any port 500 -> (vmx0:0) static-port # Automatic outbound rule nat on vmx0 inet from 127.0.0.0/8 to any port 500 -> (vmx0:0) static-port # Automatic outbound rule nat on vmx0 inet from (vmx1:network) to any -> (vmx0:0) port 1024:65535 # Automatic outbound rule nat on vmx0 inet from (lo0:network) to any -> (vmx0:0) port 1024:65535 # Automatic outbound rule nat on vmx0 inet from 127.0.0.0/8 to any -> (vmx0:0) port 1024:65535 # Automatic outbound rule

Pinoir avatar Jan 30 '23 13:01 Pinoir

Just to be sure, you are not able to ping 8.8.8.8 from this machine? if that's the case, it's probably a good idea to capture some traffic first. so far all looks normal, I don't expect your machine is sending out traffic with address 0.0.0.0

AdSchellevis avatar Jan 30 '23 13:01 AdSchellevis