gemato icon indicating copy to clipboard operation
gemato copied to clipboard

Implement fallback to IPv4 if IPv6 times out

Open thesamesam opened this issue 2 years ago • 8 comments

We get a fair amount of users who find emerge --sync hangs when refreshing keys. Often, it's the case that their network has broken IPv6 connectivity.

We should fall back to IPv4 if IPv6 times out, given how common this is.

See also https://bugs.gentoo.org/779766.

thesamesam avatar Jan 10 '23 05:01 thesamesam

Well, I've implemented explicit timeouts — I wonder if this is sufficient to get some implicit IPv4 fallback to work.

mgorny avatar Apr 29 '23 14:04 mgorny

I have the problem that very often my ISP breaks IPv6 or IPv4 which forces me to choose explicitly one of the two. Here is what I do to make gemato work only with IPv4 only:


diff --git a/gemato/openpgp.py b/gemato/openpgp.py
index 483e15f..3543f81 100644
--- a/gemato/openpgp.py
+++ b/gemato/openpgp.py
@@ -37,6 +37,7 @@ from gemato.exceptions import (
 
 try:
     import requests
+    requests.packages.urllib3.util.connection.HAS_IPV6 = False
 except ImportError:
     requests = None

Ofc, this is ugly, and we can make a nice /etc/portage/make.conf option out of this ...

If interested, I can contribute a patch.

oz123 avatar May 15 '23 18:05 oz123

How is an option supposed to be nice when it can only be implemented using an ugly hack?

mgorny avatar May 15 '23 18:05 mgorny

Well, beauty is in the eye of the beholder. The developers urllib3 offer this switch for choosing IPv4. Adding a command line flag for emerge sync or environment variable would be positively accepted by many users.

oz123 avatar May 16 '23 08:05 oz123

The developers urllib3 offer this switch for choosing IPv4.

Is this documented anywhere? It looks like an implementation detail and not a public-facing "switch".

mgorny avatar May 16 '23 11:05 mgorny

You are right. It's not something public.

oz123 avatar May 16 '23 11:05 oz123

I am so happy to have come across this. I have been having issues with emerge --sync getting stuck at Refreshing keys via WKD ..._.

After doing some investigating I found that my network was getting assigned IPv6 DHCP from my ISP and IPv6 DNS is working, however, any IPv6 traffic (eg ping -6 [URL | IPv6 address]) would get no response.

So I did some digging into Portages source code.

I found the following code snippet:

def _refresh_keys(self, openpgp_env):
        """
        Refresh keys stored in openpgp_env. Raises gemato.exceptions.GematoException
        or asyncio.TimeoutError on failure.

        @param openpgp_env: openpgp environment
        @type openpgp_env: gemato.openpgp.OpenPGPEnvironment
        """

        if openpgp_env.refresh_keys_wkd():
            out.eend(0)
            return
        out.eend(1)

This seems to link to this class function here:

class IsolatedGPGEnvironment(SystemGPGEnvironment):
    """
    An isolated environment for OpenPGP routines. Used to get reliable
    verification results independently of user configuration.

    Remember to close() in order to clean up the temporary directory,
    or use as a context manager (via 'with').
    """

    def __init__(self, debug=False, proxy=None, timeout=None):
        super().__init__(debug=debug)
        self.proxy = proxy
        self.timeout = timeout
        self._home = tempfile.mkdtemp(prefix='gemato.')

Based on this information it looks like a user can set a timeout somehow due to the following code:

if timeout is not None:
                # GPG doesn't accept sub-second timeouts
                gpg_timeout = math.ceil(timeout)
                f.write(f"""
# respect user-specified timeouts
resolver-timeout {gpg_timeout}
connect-timeout {gpg_timeout}
""")

However, I am unable to find any documentation that indicates that this can be set as a variable via a command line or .conf file, admitidly I haven't done much digging into the source code.

After reviewing the refresh_keys_wkd method I found line 589.

resp = requests.get(url, proxies=proxies, timeout=self.timeout)

I then did further digging into the requests module source code and the urllib3 module that requests uses. At this point it got a bit over my head, however, this is what I was able to take away from it.

  • Gemato uses the Requests module to handle the request.
  • Requests module then uses the urllib3 module
  • urllib3 should then timeout after its default time as Gemato doesn't seem to pass a default to requests to pass on.

For me eventually the request times out and continues on without verifying the keys.

I have a similar issue with checking repositories, however the IPv6 connections timeout and fall back to IPv4.

The only possible solution I can think of now is to modify the refresh_keys_wkd(self): in a way where resp = requests.get(url, proxies=proxies, timeout=self.timeout) is called with a resolved IPv6 address (if obtainable) with a short timeout (eg 10 seconds) and if the response was a failure from timeout to then try with an IPv4 address. There are caveats to this possible solution, however, and that is that I was unable to determine if requests.get supports an IP variable or has to be a hostname.

Sorry for the lengthy message, I hope this maybe is of use to someone else trying to find a possible solution to this issue as I spent a lot of time troubleshooting my network and then digging around to see if I could find a fix before stumbling upon this post. also if I made any mistakes feel free to let me know, its late and I'm very tired.

EDIT: I posted this info also incase it helps with the message @mgorny posted

Well, I've implemented explicit timeouts — I wonder if this is sufficient to get some implicit IPv4 fallback to work.

EDIT2: fixed area stating it eventually falls back to IPv4 as that's emerge-webrsync.

ModernKiwi avatar Aug 02 '23 11:08 ModernKiwi

I am so happy to have come across this. I have been having issues with emerge --sync getting stuck at Refreshing keys via WKD ..._.

Yeah, it's really frustrating. I've added --debug support to Portage together with mgorny's help on the gemato side which helps here at least.

After doing some investigating I found that my network was getting assigned IPv6 DHCP from my ISP and IPv6 DNS is working, however, any IPv6 traffic (eg ping -6 [URL | IPv6 address]) would get no response.

So I did some digging into Portages source code.

I found the following code snippet:

def _refresh_keys(self, openpgp_env):
        """
        Refresh keys stored in openpgp_env. Raises gemato.exceptions.GematoException
        or asyncio.TimeoutError on failure.

        @param openpgp_env: openpgp environment
        @type openpgp_env: gemato.openpgp.OpenPGPEnvironment
        """

        if openpgp_env.refresh_keys_wkd():
            out.eend(0)
            return
        out.eend(1)

This seems to link to this class function here:

class IsolatedGPGEnvironment(SystemGPGEnvironment):
    """
    An isolated environment for OpenPGP routines. Used to get reliable
    verification results independently of user configuration.

    Remember to close() in order to clean up the temporary directory,
    or use as a context manager (via 'with').
    """

    def __init__(self, debug=False, proxy=None, timeout=None):
        super().__init__(debug=debug)
        self.proxy = proxy
        self.timeout = timeout
        self._home = tempfile.mkdtemp(prefix='gemato.')

Based on this information it looks like a user can set a timeout somehow due to the following code:

Yeah, it works as a command line arg --timeout (see e.g. gemato gpg-wrap -h) or via the class constructor (for use via e.g. Portage).

if timeout is not None:
                # GPG doesn't accept sub-second timeouts
                gpg_timeout = math.ceil(timeout)
                f.write(f"""
# respect user-specified timeouts
resolver-timeout {gpg_timeout}
connect-timeout {gpg_timeout}
""")

However, I am unable to find any documentation that indicates that this can be set as a variable via a command line or .conf file, admitidly I haven't done much digging into the source code.

After reviewing the refresh_keys_wkd method I found line 589.

resp = requests.get(url, proxies=proxies, timeout=self.timeout)

I then did further digging into the requests module source code and the urllib3 module that requests uses. At this point it got a bit over my head, however, this is what I was able to take away from it.

I was hoping we could just ask requests to fallback and we could use that to influence what to tell gpg to use later on, but apparently not: https://github.com/psf/requests/issues/1691.

* Gemato uses the [Requests](https://github.com/psf/requests/tree/main?rgh-link-date=2023-08-02T11%3A04%3A46Z) module to handle the request.

* Requests module then uses the [urllib3 ](https://github.com/urllib3/urllib3/tree/main?rgh-link-date=2023-08-02T11%3A04%3A46Z)module

* urllib3 should then timeout after its default time as Gemato doesn't seem to pass a default to requests to pass on.

For me eventually the request times out and continues on without verifying the keys.

I have a similar issue with checking repositories, however the IPv6 connections timeout and fall back to IPv4.

FWIW, if it's that bad (not just affecting gpg's custom getaddrinfo), you should either disable IPv6 fully (assuming fixing your network isn't an option) and/or edit /etc/gai.conf to force IPv4.

But I agree it'd be nice to have something here given it's not easy for people to see what's going on at all.

The only possible solution I can think of now is to modify the refresh_keys_wkd(self): in a way where resp = requests.get(url, proxies=proxies, timeout=self.timeout) is called with a resolved IPv6 address (if obtainable) with a short timeout (eg 10 seconds) and if the response was a failure from timeout to then try with an IPv4 address. There are caveats to this possible solution, however, and that is that I was unable to determine if requests.get supports an IP variable or has to be a hostname.

Right, we could do our own probe and use the resolved names. It's not as elegant as I was hoping for, though.

I'm left wondering if we should just add a timer for Portage's refresh stage where it'll tell you to check your IPv6 connectivity if it takes >= 30s or so.

thesamesam avatar Aug 19 '23 14:08 thesamesam