operating-system icon indicating copy to clipboard operation
operating-system copied to clipboard

Can't install add-ons if port 80 is blocked

Open mtdcr opened this issue 3 years ago • 12 comments
trafficstars

Describe the issue you are experiencing

There's a connectivity check in buildroot-external/rootfs-overlay/etc/NetworkManager/NetworkManager.conf using http://version.home-assistant.io/online.txt as its target. If this check fails, because port 80 is firewalled, Cloudflare is down or nic.io has problems, Home Assistant will complain about being offline and won't allow installing any add-ons. Curiously, I was able to install one add-on on a fresh setup before this check became a problem.

Supervisor was already modified to use HTTPS in June (https://github.com/home-assistant/supervisor/commit/bcef34012d6c93b05b547281eba20c152e39e505).

With most services being available over HTTPS today, it doesn't seem appropriate to require HTTP for this check.

A way to disable/override this test would be useful, as most machines running Home Assistant are probably on a permanent internet connection, and a check from the past doesn't say anything about connectivity now or in the near future. At least, the result of this check should not prevent people to operate Home Assistant normally.

What operating system image do you use?

ova (for Virtual Machines)

What version of Home Assistant Operating System is installed?

7.0

Did you upgrade the Operating System.

No

Steps to reproduce the issue

  1. Block outgoing traffic on port 80
  2. Install haos
  3. Try to install add-ons, wait a while, try again

Anything in the Supervisor logs that might be useful for us?

21-12-21 00:55:17 WARNING (MainThread) [supervisor.jobs] 'AddonManager.install' blocked from execution, no host internet connection

Anything in the Host logs that might be useful for us?

Sorry, can't choose any log provider in current /hassio/system, but I guess there's not much info needed anyway.

System Health information

No response

Additional information

No response

mtdcr avatar Dec 21 '21 00:12 mtdcr

With most services being available over HTTPS today, it doesn't seem appropriate to require HTTP for this check.

Using http is encouraged by NetworkManager (see https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=eab32a5252e82361a563154cd8bfc3949aaad119). There is a chicken-egg problem here: TLS connections need a correct real-time clock on the client. We used to run into problems that the connectivity check failed due to wrong clock on the client (which is quite common since some single-board computers lack a proper RTC). We are not the only distribution doing this, e.g. Ubuntu uses http://connectivity-check.ubuntu.com/ too.

A way to disable/override this test would be useful, as most machines running Home Assistant are probably on a permanent internet connection, and a check from the past doesn't say anything about connectivity now or in the near future.

Afaik, NetworkManager retries when it sees changes in network setup. Just trying to connect kinda defeats the purpose of the conectivity check :sweat_smile:

agners avatar Dec 23 '21 16:12 agners

On Thu, 23 Dec 2021 08:45:48 -0800 Stefan Agner @.***> wrote:

With most services being available over HTTPS today, it doesn't seem appropriate to require HTTP for this check.

Using http is encouraged by NetworkManager (see https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=eab32a5252e82361a563154cd8bfc3949aaad119). There is a chicken-egg problem here: TLS connections need a correct real-time clock on the client. We used to run into problems that the connectivity check failed due to wrong clock on the client (which is quite common since some single-board computers lack a proper RTC). We are not the only distribution doing this, e.g. Ubuntu uses http://connectivity-check.ubuntu.com/ too.

If the TLS connection fails because the certificate couldn't be verified, I'd say we're online, right? You don't take into account the HTTP status code either, do you?

A way to disable/override this test would be useful, as most machines running Home Assistant are probably on a permanent internet connection, and a check from the past doesn't say anything about connectivity now or in the near future.

Afaik, NetworkManager retries when it sees changes in network setup. Just trying to connect kinda defeats the purpose of the conectivity check :sweat_smile:

That's because the connectivity check is useless. The internet is decentralized for a reason. What does the connectivity check do other than just trying to connect? What's the benefit of the connection check telling me I'm online when my connection breaks right after that? What's the benefit if the route to cloudflare breaks but the routes to my important services keep working?

Please add an option to disable this check (so that disabled means to prentend we're always online). This saves traffic, so it's good for the environment.

mtdcr avatar Dec 23 '21 17:12 mtdcr

If the TLS connection fails because the certificate couldn't be verified, I'd say we're online, right?

We are online, but can't use any service using https (which basically is all of them).

You don't take into account the HTTP status code either, do you?

Not sure.

That's because the connectivity check is useless. The internet is decentralized for a reason. What does the connectivity check do other than just trying to connect? What's the benefit of the connection check telling me I'm online when my connection breaks right after that? What's the benefit if the route to cloudflare breaks but the routes to my important services keep working?

This is NetworkManager not "me". I trust the fine developer doing the "right" thing.. You are welcome to start a discussion with them about implementation details etc. Also note that it works for 100k+ users (https://analytics.home-assistant.io/).

Please add an option to disable this check (so that disabled means to prentend we're always online). This saves traffic, so it's good for the environment.

I am not opposed to that idea. Afaik connection check can be disabled on NetworkManager level. We could add a flag to Supervisor, and pass that flag down via D-Bus to NetworkManager.

agners avatar Dec 27 '21 09:12 agners

On Mon, 27 Dec 2021 01:40:17 -0800 Stefan Agner @.***> wrote:

If the TLS connection fails because the certificate couldn't be verified, I'd say we're online, right?

We are online, but can't use any service using https (which basically is all of them).

This is in no way different from the status quo, using HTTP for this check.

Does an unset clock on the host imply an unset clock on the supervisor and core? Anyway, this is an orthogonal problem, already solved by NTP (and the NTP settings provisioned by DHCP should be respected, of course).

If no RTC is available and a semi-valid clock is needed at boot, before NTP gets available, systemd solved this problem a long time ago. On boot, it can restore the timestamp saved during the last shutdown, or it can set the clock to a built-in timestamp. This makes sure the host won't be stuck too far in the past and it maintains an increasing clock across reboots and power cycles.

That's because the connectivity check is useless. The internet is decentralized for a reason. What does the connectivity check do other than just trying to connect? What's the benefit of the connection check telling me I'm online when my connection breaks right after that? What's the benefit if the route to cloudflare breaks but the routes to my important services keep working?

This is NetworkManager not "me". I trust the fine developer doing the "right" thing.. You are welcome to start a discussion with them about implementation details etc. Also note that it works for 100k+ users (https://analytics.home-assistant.io/).

I won't discuss settings used by Home Assistant OS and choices made by developers with the NetworkManager community. Nobody gets forced to use this feature of NetworkManager in their application. I don't want to convince them to change their tool. I'd like to convince you that it's the wrong tool for this very specific task. It's nothing more than a binary_sensor for the availability if version.home-assistant.io, with the exception that I can't decide to turn off automations connected to it (yet).

If the network interface has link, an address and a route, it's online. If it can't reach a particular service, then it's either the interface itself, the service or something inbetween that's disconnected. It's irrational to use this information to influence application logic in software that's meant to be independent from cloud services. And it's not like this would save any developer from proper error handling in network code.

Most people don't limit traffic, because they can't, don't know why or how, and don't want to spend time debugging things. That doesn't mean they prefer unencrypted communication protocols. And if they do enforce limits, they probably disable analytics, too, or may be unable to connect.

Please add an option to disable this check (so that disabled means to prentend we're always online). This saves traffic, so it's good for the environment.

I am not opposed to that idea. Afaik connection check can be disabled on NetworkManager level. We could add a flag to Supervisor, and pass that flag down via D-Bus to NetworkManager.

Thank you, that would be really nice!

I'm just trying to switch from a manual setup to HAOS without giving up basic security properties.

Regards, Andreas

mtdcr avatar Dec 27 '21 14:12 mtdcr

If the TLS connection fails because the certificate couldn't be verified, I'd say we're online, right? We are online, but can't use any service using https (which basically is all of them).

This is in no way different from the status quo, using HTTP for this check.

Right, for the online check, bypassing the TLS validity check/rely on http headers only to determine online status should work.

Does an unset clock on the host imply an unset clock on the supervisor and core?

Time is "global" in Linux, so yes, unset clock on OS level implies unset clock everywhere.

Anyway, this is an orthogonal problem, already solved by NTP (and the NTP settings provisioned by DHCP should be respected, of course).

Correct, and that is what we use the online status for as well: We rely on the service NetworkManager-wait-online.service (provided by upstream NetworkManager, and as far as I understand relying on the connectivity check) to determine when we are online and the systemd-time-wait-sync.service until time is synchronized.

(and the NTP settings provisioned by DHCP should be respected, of course).

On my todo, see #689.

If no RTC is available and a semi-valid clock is needed at boot, before NTP gets available, systemd solved this problem a long time ago. On boot, it can restore the timestamp saved during the last shutdown, or it can set the clock to a built-in timestamp. This makes sure the host won't be stuck too far in the past and it maintains an increasing clock across reboots and power cycles.

Yes, we rely/use on all those mechanisms.

I won't discuss settings used by Home Assistant OS and choices made by developers with the NetworkManager community.

When I use https, I get a warning from NetworkManager:

use of HTTPS for connectivity checking is not reliable and is discouraged (URI: %s)

If you don't agree on the implementation of the connectivity check, then you should discuss that with the NetworkManager folks.

Quite honestly, I think in 2021 using https and relying only on header/ignore TLS dates for that particular case, might be the better choice. But I am not sure, I don't know why they say "HTTPS for connectivity checking is not reliable"... I haven't seen the million networks NM devs have seen, so for now I follow their recommendation. From what I can see, that error got implemented in 2015 (see https://bugzilla.gnome.org/show_bug.cgi?id=747866). Maybe its worth to discuss/reconsider today?

Nobody gets forced to use this feature of NetworkManager in their application.

Nobody gets forced to use Home Assistant OS :smirk:

Honestly, not being for every use case is a bit by design of HAOS: It's not as configurable. We try to make default decisions for users which work well for most users. If we support every use case, we essentially end up with Debian + Supervisor.

I don't want to convince them to change their tool. I'd like to convince you that it's the wrong tool for this very specific task. It's nothing more than a binary_sensor for the availability if version.home-assistant.io, with the exception that I can't decide to turn off automations connected to it (yet).

We have been fighting with reliability/race conditions connecting to various services (due to wrong time/not "really" online). The online status helps for many to get things up properly ordered: Wait until online -> sync NTP -> start Supervisor (which in turn downloads version etc.). Rather than reinvent our own wheel, using infrastructure provided by well maintained tools makes sense.

That said, its not ideal the way it is: Booting without network takes longer than necessary (since systemd waits 90s before continuing), and the user has no good feedback on what went wrong. I have some rework in the back of my mind. But so far I was planning on continue using the NM online status.

agners avatar Dec 27 '21 15:12 agners

First of all I need to say that I totally agree with @agners on the fact that if a connectivity check to the internet has to be done then it has to be done using the tools someone else developed, just not to reinvent the wheel, and that, in the particular context of Network Manager, using HTTP is the correct thing to do.

On the other hand I ask myself why using the connectivity check functionality at all. Reading at the docs the functionality is used for two purposes (which I report here for clarity):

  1. For one, it exposes a connectivity state on D-Bus, which other applications may use. For example, Gnome's portal helper uses this as signal to show a captive portal login page.
  2. The other use is that default-route of devices without global connectivity get a penalty of +20000 to the route-metric. This has the purpose to give a better default-route to devices that have global connectivity. For example, when being connected to WWAN and to a Wi-Fi network which is behind a captive portal, WWAN still gets preferred until login.

Purpose number one shows the example of the captive portal which is not the one where any service, Home Assistant included, would be put. Is there any need, which at the moment I don't recall, for exposing the connectivity state on D-Bus? Apart to know if the question "is there any complete working path to the internet?", obviously. In this context, I kinda get what @mtdcr is saying here:

That's because the connectivity check is useless. The internet is decentralized for a reason. What does the connectivity check do other than just trying to connect? What's the benefit of the connection check telling me I'm online when my connection breaks right after that? What's the benefit if the route to cloudflare breaks but the routes to my important services keep working?

Regarding purpose number two I think that everyone agrees that either it is something that most of the users do not have in their Home Assistant setup or that if they have a dual WAN connection this is done on their router and not directly on the Home Assistant OS box. Basically Home Assistant OS always has only one route to everywhere and this is either taken from the DHCP or statically setup.

Could @agners help me understanding better the usage of this feature?

redgryphon avatar Mar 13 '22 12:03 redgryphon

For one, it exposes a connectivity state on D-Bus, which other applications may use. For example, Gnome's portal helper uses this as signal to show a captive portal login page.

In HA too the D-Bus API is used to learn the connectivity state of the host system. In our case, it is the Supervisor which uses the D-Bus API to understand if connectivity on the host system is available. It is shown in ha network info as host_internet. Supervisor uses the information internally to decide if a certain job should be executed (like installing and add-on).

agners avatar Mar 14 '22 09:03 agners

In HA too the D-Bus API is used to learn the connectivity state of the host system. In our case, it is the Supervisor which uses the D-Bus API to understand if connectivity on the host system is available. It is shown in ha network info as host_internet. Supervisor uses the information internally to decide if a certain job should be executed (like installing and add-on).

I noticed that the supervisor actually checks the connectivity on its own by querying directly a HTTPS service. I think I understood that that check would populate the supervisor_internet field in ha network info.

What particular need would not be satisfied from the Network Manager having its heartbeat off? I don't get why both checks have to be on considering that the only place (apart from the displaying in ha network info feature) where the host connectivity info is used is in the task manager. Exactly where the supervisor check is done anyway. I just want to understand the internals better, let's be clear. Probably there's a reason I am unaware of.

BTW, I imagine that if Network Manager exposed a ICMP connectivity check instead of an HTTP one then it would be the solution for everyone's problems in the particular context of this issue. Correct me if I'm wrong.

redgryphon avatar Mar 14 '22 11:03 redgryphon

BTW, I imagine that if Network Manager exposed a ICMP connectivity check instead of an HTTP one then it would be the solution for everyone's problems in the particular context of this issue. Correct me if I'm wrong.

Well, using ICMP would allow blocking HTTP, but obviously it still requires direct and unrestricted access to the internet for a protocol that's a) not needed for operation and b) can be blocked by any router, CDN or reverse proxy on the way to the single point of failure.

ICMP can be used to exfiltrate arbitrary payloads. You can - in theory, because I doubt it's supported by HA at this point, but let's rather try moving into this direction, not the opposite way - limit outgoing HTTPS connections by using a web proxy with a list of allowed hostnames. But you cannot reasonably restrict allowed destinations with firewall rules at any IP protocol level, because A/AAAA records in DNS may change anytime.

mtdcr avatar Mar 25 '22 17:03 mtdcr

What particular need would not be satisfied from the Network Manager having its heartbeat off?

I wouldn't call it heartbeat, It's just a connectivity check.

I don't know the exact details, but I assume its to cover the two slightly different DNS setups (resolvers) found in Supervisor and on the Host system: It can help to better understand where connectivity actually is missing. For an add-on installation Docker API will be called, which in turn uses the DNS handling of the host. So making sure the host has internet connectivity before poking the API (which otherwise actually fails with rather cryptic errors like 500 Server Error for http+docker://localhost/v1.40/images/create?....: Internal Server Error ("Get "https://registry-1.docker.io/v2/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)")).

That being said, as noted above, I think we should consider moving back to HTTPS despite NetworkManager's recommendation. I'd like to have a discussion with NetworkManager folks to clarify the exact reasons why they actively discourage the use of https, but I did not come around to raise an issue (and as it seems nobody complaining here).

BTW, I imagine that if Network Manager exposed a ICMP connectivity check instead of an HTTP one then it would be the solution for everyone's problems in the particular context of this issue. Correct me if I'm wrong.

Well, using ICMP would allow blocking HTTP, but obviously it still requires direct and unrestricted access to the internet for a protocol that's a) not needed for operation and b) can be blocked by any router, CDN or reverse proxy on the way to the single point of failure.

Agreed, using ICMP would certainly lead to (even more) problems. So many network environments which block ICMP one way or another (unfortunately).

agners avatar Mar 30 '22 14:03 agners

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 28 '22 15:06 github-actions[bot]

Time hasn't solved this issue yet, dear bot.

mtdcr avatar Jun 28 '22 20:06 mtdcr

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 26 '22 21:09 github-actions[bot]