furl icon indicating copy to clipboard operation
furl copied to clipboard

'127.0.0.1:8329' parsed wrong in Python 3.9+

Open bityob opened this issue 2 years ago • 3 comments

Python 3.9.13 (main, Jun 19 2022, 13:12:56) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import furl
>>> furl.furl('127.0.0.1:8329').url
'8329'
>>> from six.moves import urllib
>>> urllib.parse.urlsplit('127.0.0.1:8329')
SplitResult(scheme='127.0.0.1', netloc='', path='8329', query='', fragment='')
Python 3.8.10 (default, Mar 15 2022, 12:22:08) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import furl
>>> furl.furl('127.0.0.1:8329').url
'127.0.0.1:8329'
>>> from six.moves import urllib
>>> urllib.parse.urlsplit('127.0.0.1:8329')
SplitResult(scheme='', netloc='', path='127.0.0.1:8329', query='', fragment='')

This is due to a fix that was done on Python3.9 (https://github.com/python/cpython/issues/71844) that changed the scheme field in case no scheme is provided.

This fix also broke requests behavior and they had to replace their parsing method, with urllib3 (https://github.com/psf/requests/pull/5917).

I suggest to fix this issue with urllib3 too

bityob avatar Jun 19 '22 11:06 bityob

This is my solution on our code, until furl is fixed.

import furl 

def urlsplit_based_on_urllib3(url):
    """
    Returns same values as `urllib.parse.urlsplit` returns before Python3.9

    >>> urlsplit_based_on_urllib3('127.0.0.1:8329')
    (None, None, '127.0.0.1:8329', None, None)
    """
    from urllib3.util import parse_url
    u = parse_url(url)
    if u.netloc and not u.path:
        return u.scheme, None, u.netloc, u.query, u.fragment
    return u.scheme, u.netloc, u.path, u.query, u.fragment


try:
    furl.urllib.parse.urlsplit = urlsplit_based_on_urllib3
except:
    logger.error("Failed to fix furl urlsplit usage")
>>> furl.furl('127.0.0.1:8329')
furl('127.0.0.1:8329')

bityob avatar Jun 19 '22 15:06 bityob

thank you for opening this issue! great catch, and thank you for providing the super helpful links for context

let's fix this; consistency is key. do you have time to submit a PR which replaces furl's version of urlsplit (https://github.com/gruns/furl/blob/master/furl/furl.py#L284), which is built with six.moves.urllib.parse.urlsplit() with one based on urllib3?

thank you!

gruns avatar Jun 28 '22 07:06 gruns

Using urllib3 may not be ideal because that project officially supports only HTTP URLs. This can cause problems with other schemes. For example, when parsing a tel URL with urllib3, a leading '/' is prepended to the phone number.

kenballus avatar Feb 02 '23 13:02 kenballus