operations icon indicating copy to clipboard operation
operations copied to clipboard

IPv6 Cogent connectivity issues to non-CDN hosts (api.openstreetmap.org etc.)

Open mnalis opened this issue 2 years ago • 3 comments

  • When OpenStreetMap uses IPv6 via fastly CDN (eg. tile.openstreetmap.org = 2a04:4e42:39::729), there is no problem, and IPv6 access to it works via all networks (both those connected to Cogent, and those connected to HE.net and rest of the world)

  • However when OpenStreetMap users IPv6 on Cogent-only network (www.openstreetmap.org or api.openstreetmap.org at 2001:978:2:2c::172:b / 2001:978:2:2c::172:c / 2001:978:2:2c::172:d), it is only accessible to people who peer with Cogent. Anybody else in the rest of the world who has IPv6 (and does not peer with Cogent directly nor use ISP which peers with Cogent) cannot access that addresses. What is worse, it is not just that the connection is refused (so IPv4 can be retried) but connections "hang" trying to connect, thus leading to annoying delays.

It was mentioned before in https://github.com/openstreetmap/operations/issues/482, but never resolved. It is however much more serious than stated before:

There is a well known battle between HE and Cogent that means connections between them can be unreliable.

It is not that it (sometimes) "can be unreliable", it is that it never works at all.

mnalis avatar Aug 07 '21 03:08 mnalis

Only reasons why it doesn't invoke more complaints are:

  • many (most) people do not have IPv6 at all, so are not bothered (as the problem does not hit them)
  • desktop web browsers employ "happy eyeballs" (RFC 8035) strategy which makes simultaneous IPv4+IPv6 connections, and uses the faster of the two (thus falling back to IPv4 in all cases when rest of the world tries to connect to IPv6 Cogent address space).
  • some people actually use Cogent so do not experience problems when communicating on IPv6 relation Cogent<->Cogent.
  • people are annoyed, but have no idea what is wrong / how to troubleshoot / who to complain to.

Note that in mobile world, it is much worse. Apps like Vespucci and StreetComplete try to connect to api.openstreetmap.org and face long timeouts on each attempt, before deciding to try another IP address (which might be other non-working IPv6 address)

mnalis avatar Aug 07 '21 03:08 mnalis

Several possible solutions to the problem exist:

  1. Cogent makes a deal with HE.net, Google and rest of the world and peers with them. Ideal as everybody can use IPv6 to reach OSM servers then, but unlikely to happen (But maybe if more customers complained...)
  2. OSM removes DNS AAAA records pointing to Cogent IPv6 address space, thus returning to using only IPv4 for non-CDN services. It would fix the problem, and easily, but is easy way out, and one might argue we should be really supporting more IPv6, not less.
  3. OSM acquires provider-independent IPv6 address space and BGP peers both with Cogent and HE.net, thus covering whole of the planet. Problem is, it is more work to implement (and costs a little extra)
  4. OSM proxies IPv6 access to api.openstreetmap.org/www.openstreetmap.org via CDN with fully working IPv6 (like it currently does for tiles.openstreetmap.org). Somewhat less work, but has its own problems possibly (IP-based rate limiting ?)
  5. each and every OSM-using app in the mobile world implements Happy Eyeballs RFC 8035, thus falling back to IPv4 without delays. Lots of different apps and teams to persuade, though.
  6. nothing is done, and users with IPv6 (and not on Cogent) continue to be frustrated with OSM performance
  7. every non-Cogent user blackholes Cogent IPv6 address space so connections fail fast, so apps will transit to IPv4 more quickly (lots of users though, and not everybody has such access to their routers / rooted mobile devices), eg.
  #!/bin/sh
  for prefix in 2001:0550::/32 2001:067c:12e8::/48 2001:0978::/32 2607:9700::/32 2607:f298:000a::/48 2607:f5d8::/32 2610:00f8:2f00::/48 2610:00f8:2fed::/48 2620:009a:8000::/48 2620:00fb::/48 2620:00fb::/56
  do
    ip -6 route add unreachable $prefix 2>/dev/null
  done

What do powers-that-be think?

mnalis avatar Aug 07 '21 03:08 mnalis

Our new data centre which we're currently working on bringing up has deliberately used a different network provider and the intention is to migrate the Amsterdam data centre to a new provider as well to resolve this.

tomhughes avatar Aug 07 '21 06:08 tomhughes