esp-idf-svc icon indicating copy to clipboard operation
esp-idf-svc copied to clipboard

Weird behaviour when both Eth and Wifi are started with static IP config

Open thetek42 opened this issue 9 months ago • 34 comments

In our application, we are starting both Ethernet and Wi-Fi. When a static IP is configured for both interfaces and an Ethernet cable is connected to the ESP32, connections to a server cannot be made under certain circumstances.

Notes:

  • There are no Wi-Fi credentials configured, but the driver is started anyways. .connect() is not called.
  • When DHCP is enabled for both Ethernet and Wi-Fi, everything is fine. Only static IP config seems to be affected.
  • When Ethernet is configured to use a static IP and Wi-Fi is set to use DHCP (or vice-versa), the connection cannot be established at all.
  • When connecting to Wi-Fi with credentials, then disconnecting once an Ethernet cable has been plugged in, the connection can be established all the time. I am unsure how connecting to Wi-Fi in advance and then disconnecting (without stopping the driver) causes Ethernet to work all of a sudden.
  • Looking at the network communication with Wireshark, no packets are transmitted via Ethernet.

These are the cases that I observed:

  • Case 1:
    • start Ethernet
    • start Wi-Fi
    • try to connect to server, connection fails
  • Case 2:
    • start Wi-Fi
    • start Ethernet
    • try to connect to server, connection succeeds
  • Case 3:
    • start Ethernet (without waiting for it to finish)
    • start Wi-Fi
    • wait for Ethernet to finish (.wait_netif_up())
    • try to connect to server. Here, my observations varied. With BlockingEth, I could never manage to get a connection to a server. With AsyncEth, I could get a connnection in approximately 1 of 5 of cases. I tried the same thing with AsyncEth again some time later, but for some reason I could not get a connection there, not even in the rare 20%-ish chance that I observed earlier.

The behaviour for "connect to Wi-Fi but not to Ethernet" (by not having an Ethernet cable connected) has the exact reverse effect: starting Wi-Fi and then Ethernet causes the connection to fail, but the reverse causes the connection to succeed.

Thus, the interface that was started last "determines" which interface can work with static IP. Again, for DHCP, all of this does not matter since it works all the time, no matter in which order the interfaces were started.

Below is a piece of sample code that can be used to (hopefully) reproduce the issue. Feel free to play around a bit by moving stuff around (e.g. putting Wi-Fi after Ethernet or putting the eth.wait_*() block after the Wi-Fi code).

Example code
#![allow(unused_imports)]

use std::io::{Read, Write};
use std::net::{Ipv4Addr, TcpStream};

use esp_idf_svc::eth::{EspEth, EthDriver, RmiiEthChipset, RmiiClockConfig, BlockingEth, AsyncEth};
use esp_idf_svc::eventloop::EspSystemEventLoop;
use esp_idf_svc::hal::gpio::{self, PinDriver};
use esp_idf_svc::hal::prelude::Peripherals;
use esp_idf_svc::ipv4::{ClientConfiguration, Configuration, ClientSettings, Subnet, Mask, DHCPClientSettings, RouterConfiguration};
use esp_idf_svc::log::EspLogger;
use esp_idf_svc::netif::{NetifConfiguration, EspNetif};
use esp_idf_svc::timer::EspTaskTimerService;
use esp_idf_svc::wifi::{self, WifiDriver, EspWifi, BlockingWifi, AuthMethod, AsyncWifi};

fn main() -> anyhow::Result<()> {
    esp_idf_svc::sys::link_patches();
    EspLogger::initialize_default();

    let peripherals = Peripherals::take()?;
    let pins = peripherals.pins;
    let sys_loop = EspSystemEventLoop::take()?;
    let timer_service_wifi = EspTaskTimerService::new()?;
    let timer_service_eth = EspTaskTimerService::new()?;



    let mut eth_pwr = PinDriver::output(pins.gpio5)?;
    let mut clk_en = PinDriver::output(pins.gpio4)?;
    eth_pwr.set_low()?;
    clk_en.set_low()?;
    std::thread::sleep(std::time::Duration::from_millis(100));
    eth_pwr.set_high()?;
    std::thread::sleep(std::time::Duration::from_millis(10));
    clk_en.set_high()?;
    std::thread::sleep(std::time::Duration::from_millis(10));

    log::info!("--- EthDriver");
    let eth_driver = EthDriver::new(
        peripherals.mac,
        pins.gpio25,
        pins.gpio26,
        pins.gpio27,
        pins.gpio23,
        pins.gpio22,
        pins.gpio21,
        pins.gpio19,
        pins.gpio18,
        RmiiClockConfig::<gpio::Gpio0, gpio::Gpio16, gpio::Gpio17>::Input(pins.gpio0),
        Some(pins.gpio33),
        RmiiEthChipset::LAN87XX,
        None,
        sys_loop.clone(),
    )?;

    log::info!("--- EspNetif");
    let netif = EspNetif::new_with_conf(&NetifConfiguration {
        ip_configuration: Configuration::Client(ClientConfiguration::Fixed(ClientSettings {
            // using a different ip address than wi-fi does not make it work.
            // ip: "192.168.178.234".parse()?,
            ip: "192.168.178.235".parse()?,
            subnet: Subnet {
                gateway: "192.168.178.1".parse()?,
                mask: Mask(24),
            },
            dns: Some("192.168.178.1".parse()?),
            secondary_dns: None,
        })),
        ..NetifConfiguration::eth_default_client()
    })?;

    log::info!("--- EspEth");
    let eth = EspEth::wrap_all(eth_driver, netif)?;

    log::info!("--- AsyncEth");
    let mut eth = AsyncEth::wrap(eth, sys_loop.clone(), timer_service_eth)?;
    esp_idf_svc::hal::task::block_on(async {
        log::info!("--- eth.start()");
        eth.start().await?;
        log::info!("--- eth.wait_connected()");
        eth.wait_connected().await?;
        log::info!("--- eth.wait_netif_up()");
        eth.wait_netif_up().await
    })?;



    log::info!("--- WifiDriver");
    let wifi_driver = WifiDriver::new(peripherals.modem, sys_loop.clone(), None)?;

    let netif_sta = EspNetif::new_with_conf(&NetifConfiguration {
        ip_configuration: Configuration::Client(ClientConfiguration::Fixed(ClientSettings {
            ip: "192.168.178.234".parse()?,
            subnet: Subnet {
                gateway: "192.168.178.1".parse()?,
                mask: Mask(24),
            },
            dns: Some("192.168.178.1".parse()?),
            secondary_dns: None,
        })),
        ..NetifConfiguration::wifi_default_client()
    })?;

    let netif_ap = EspNetif::new_with_conf(&NetifConfiguration::wifi_default_router())?;

    log::info!("--- EspWifi");
    let wifi = EspWifi::wrap_all(wifi_driver, netif_sta, netif_ap)?;

    log::info!("--- AsyncWifi");
    let mut wifi = AsyncWifi::wrap(wifi, sys_loop.clone(), timer_service_wifi)?;

    log::info!("--- wifi.set_configuration()");
    wifi.set_configuration(&wifi::Configuration::Client(wifi::ClientConfiguration {
        ssid: "".try_into().unwrap(),
        bssid: None,
        auth_method: AuthMethod::WPA2Personal,
        password: "".try_into().unwrap(),
        channel: None,
    }))?;

    log::info!("--- wifi.start()");
    esp_idf_svc::hal::task::block_on(wifi.start())?;



    log::info!("--- TcpStream");
    let mut stream = TcpStream::connect((Ipv4Addr::new(93, 184, 215, 14), 80))?;
    stream.write_all(b"GET / HTTP/1.1\r\nHost: example.com\r\n\r\n")?;

    log::info!("--- response");
    let mut buf = [0; 512];
    loop {
        let read = stream.read(&mut buf)?;
        print!("{}", core::str::from_utf8(&buf[..read])?);
        if read == 0 {
            break;
        }
    }
    log::info!("--- done");

    Ok(())
}

thetek42 avatar May 03 '24 11:05 thetek42

Do any of your problems happen, if you replace "example.com" with an IP address?

ivmarkov avatar May 03 '24 11:05 ivmarkov

Yes. It does not matter if I use example.com or the corresponding IP address.

thetek42 avatar May 03 '24 11:05 thetek42

You are sure 100% about that (as in you have used IpAddr and not a string that might still trigger DNS resolution somehow)?

ivmarkov avatar May 03 '24 12:05 ivmarkov

Your example is way too big and I have difficulties understanding what exactly you are trying to achieve...

This seems very odd though:

let netif_ap = EspNetif::new_with_conf(&NetifConfiguration {
        ip_configuration: Configuration::Client(ClientConfiguration::DHCP(DHCPClientSettings {
            hostname: None,
        })),
        ..NetifConfiguration::wifi_default_router()
    })?;

How is this supposed to even work?

ivmarkov avatar May 03 '24 12:05 ivmarkov

How is this supposed to even work?

That must have snuck in there somehow. I had it set to the correct configuration previously, but it seems like I somehow managed to clone the config into the wrong place 🤷. It does not affect the result though.

thetek42 avatar May 03 '24 12:05 thetek42

Even Fixed is wrong. You can't set a client config on an access point and expect not to have issues w.r.t. routing on IP level. Why are you configuring the access point in the first place?

Can you: a) try to simplify this example to the bare minimum. Remove everything not necessary, including dns resolutions and whatnot b) try to report one problem only at first and an example about this single one problem.

ivmarkov avatar May 03 '24 12:05 ivmarkov

You are sure 100% about that (as in you have used IpAddr and not a string that might still trigger DNS resolution somehow)?

I specified it as (Ipv4Addr::new(93, 184, 215, 14), 80).

thetek42 avatar May 03 '24 12:05 thetek42

I updated the code in the original post.

thetek42 avatar May 03 '24 12:05 thetek42

Also, this issue has nothing to do with the Access Point. EspWifi::wrap_all simply requires a netif for the AP to be present.

thetek42 avatar May 03 '24 12:05 thetek42

Maybe. But it has a default configuration and you are changing it. Maybe it does not matter, as the Wifi is not in Mixed mode, but maybe it does.

My point is the following: if you leave the example as-is, and with all the cases you have enumerated, it might take me days, if not weeks to get there.

And if/when I get there, I'll start by anyway with eliminating every single line of code in the example which should not be there, which is additional unknown variable that only confuses and obfuscates the issue. Like the DNS issue. Or configuring the AP. Etc. And then try to tackle just one case of the ones you have enumerated and which should work, but isn't.

... or you could try to do it, and then I can help with thinking / brainstorming. I think it might be faster that way. :)

ivmarkov avatar May 03 '24 13:05 ivmarkov

I updated the code in the original post.

In your "simplified example"... why are you setting the eth and wifi-sta to the same IP, namely "192.168.178.234"?

ivmarkov avatar May 03 '24 13:05 ivmarkov

Also, for the netif_ap (I know it should not affect anything, but still) - can you please use the default network configuration, which would then be different from your 192.168.178.1/24, that you are setting everywhere.

ivmarkov avatar May 03 '24 13:05 ivmarkov

@thetek42 What happens once you assign different static IPs to Eth and Wifi?

ivmarkov avatar May 06 '24 18:05 ivmarkov

Sorry for the delay, I was busy the last couple of days.

Using the default values for the AP netif does not change anything.

Using a different IP address for Wi-Fi and Ethernet does in fact change the behaviour, except that it makes it worse. When both have a different static IP address configured, no matter in which order Wi-Fi and Ethernet are initialized, the connection to the server is not possible.

Most of the time, the device does not even show up in my router's user interface if that happens. However, I was able to observe it show up in the list of connections once, and then never again. In that incident, it showed the "device name" as "PC-192-168-178-235", even though the IP address was listed as 192.168.178.234 -- while the static IP address configuration for Ethernet was set to 192.168.178.235 and the IP for Wi-Fi was set to 192.168.178.234. I am unsure if this was just a bug by the router or if this is the actual behaviour. The fact that the device does not show up in the list makes sense considering that running it through Wireshark yielded no transmitted packets, but the fact that it turned up in the list exactly once does not make sense to me.

The more I investigate this issue, the more confused I am.

Again, I will update the code in the original post in order to use of default values for AP config and two different static IP addresses.

thetek42 avatar May 07 '24 12:05 thetek42

One little detail that is interesting: what list do you mean? The router does not keep any "list" of devices with static IP addresses (let alone knowing their host names) as it is just not aware of these devices.

More advanced routers might produce a "list" based on tx/rx packet statistics and potentially, based on nat statistics.

Btw: have you tried pinging the eth and wifi interfaces from another pc?

Another experiment would be to assign a static ip to the wifi interface which is from a completely different subnet (even if the wifi won't be reachable this way).

ivmarkov avatar May 07 '24 13:05 ivmarkov

I was talking about the list that the router provides in the user interface of its website.

thetek42 avatar May 07 '24 13:05 thetek42

A different IP in a different subnet for the Wi-Fi interface does not have any effect.

Pinging the Ethernet interface does work.

thetek42 avatar May 07 '24 13:05 thetek42

Also, when setting the Ethernet IP address to .235 and the Wi-Fi IP address to .234, I can ping both, even though Wi-Fi is not connected.

thetek42 avatar May 07 '24 13:05 thetek42

I was talking about the list that the router provides in the user interface of its website.

See above. :) I would not trust this list for static IPs at all. :)

ivmarkov avatar May 07 '24 14:05 ivmarkov

Also, when setting the Ethernet IP address to .235 and the Wi-Fi IP address to .234, I can ping both, even though Wi-Fi is not connected.

OK, hold on. So just to summarize. What you are saying:

  • Both interfaces are pingable from outside the Esp
  • The Wifi is not connected

Are you sure the Wifi is not connected? Otherwise, how pinging that IP would work?! Are you sure you don't have an IP conflict in your net, and you are not pinging something else, which is not the Esp?

ivmarkov avatar May 07 '24 14:05 ivmarkov

I would not trust this list for static IPs at all. :)

I don't! :) I was just looking to see if the router recognized it at all. The weird results are to be expected.

thetek42 avatar May 07 '24 14:05 thetek42

I am 100% sure the Wi-Fi is not connected. The SSID and passphrase are both set to an empty string, and .connect() isn't even called. There is no NVS in which Wi-Fi credentials might be stored. The .234 and .235 IP addresses are not taken up by any other device.

Interestingly enough, this double IP phenomenon thingy only happens sometimes.

thetek42 avatar May 07 '24 14:05 thetek42

  1. Ok let's pretend that the second (wifi) IP was never pingable (or else I should start believing in miracles, OR there is a nasty bug/behavior in esp-idf somewhere, where the wifi netif is operational even though its phy layer is not - as in it gets ethernet packets from the eth phy layer which sounds very unlikely either).

  2. let's also forget about the router list, shall we? If you have used that instead of pinging the static ips from the outside, this had been wrong all along.

Let's concentrate on the one remaining issue: from inside the esp, can you ping the gateway on the eth interface? And then - as a second step only - does opening the socket work?

ivmarkov avatar May 07 '24 14:05 ivmarkov

Obviously, I used the ping command for pinging the esp32.

It seems that it only happens for approximately 15-20 secs after flashing a new firmware on the device with a different static IP than the previous firmware. After that timespan, it does not occur again. The only way I can explain that is that there is some nasty caching going on somewhere, not neccesarily on esp-idf's side.

Pinging the gateway from the esp32 provides the exact same behaviour as opening a socket to somewhere. For this, I used the EspPing. However, I was a bit unsure as to what to set the interface to, so I left it at 0. From what I gathered, you are supposed to put the netif index in there, but obtaining the netif index via .get_index() and plugging that into EspPing yielded to no ping succeeding, even when only Ethernet was enabled in the code. Is there something I missed or misunderstood?

thetek42 avatar May 07 '24 15:05 thetek42

Pinging the gateway from the esp32 provides the exact same behaviour as opening a socket to somewhere. For this, I used the EspPing. However, I was a bit unsure as to what to set the interface to, so I left it at 0. From what I gathered, you are supposed to put the netif index in there, but obtaining the netif index via .get_index() and plugging that into EspPing yielded to no ping succeeding, even when only Ethernet was enabled in the code. Is there something I missed or misunderstood?

By the way, what are you trying to ping, and did you check that it is pingable in the first place?

ivmarkov avatar May 07 '24 15:05 ivmarkov

I am pinging the gateway (192.168.178.1).

thetek42 avatar May 07 '24 15:05 thetek42

Ok assuming it is pingable from inside (99.9% routers are, but you never know and you should check it from a pc), can you try with just the ethernet connection, without even creating the wifi driver? Until we get a reliable ping, we cannot progress any further. Try with get_index + 1, or with index = 1 and/or = 2 until it works.

ivmarkov avatar May 07 '24 15:05 ivmarkov

And just to confirm again: None of these issues happen if you don't create the wifi driver in the first place? But once you create it, even if you dont start it, issues start on the eth interface?

ivmarkov avatar May 07 '24 16:05 ivmarkov

  • Pinging the router from a PC works as expected.
  • Pinging the router with EspPing::default():
    • Only works when the Ethernet driver is created after the Wi-Fi driver.
    • When Ethernet and Wi-Fi have differing static IP configs, the TCP connection to a server is not possible. EspPing seems unaffected by this and continues working even with two different static IPs.
  • Pinging the router directly with the Ethernet interface (requires get_index() + 1) works - even when creating the Wi-Fi interface after the Ethernet interface. I guess that is to be expected since we are directly using the Ethernet interface. I guess the get_index() + 1 behaviour could be fixed or at least documented somewhere, that way people won't stumble over it in the future.
  • Not starting the Wi-Fi driver in the first place causes everything to work as expected. The problems only occur when calling wifi.start(). Just creating but not starting is fine.

thetek42 avatar May 10 '24 07:05 thetek42

@thetek42 Weird. Did you just delete your last comment from 5 minutes ago?

ivmarkov avatar May 17 '24 08:05 ivmarkov