rfc3986 Incorrect result returned by is_valid

trafficstars

According to RFC 3986, there are some ABNF rules:

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / ".")

And I tried:

>>> from rfc3986 import is_valid_uri
>>> is_valid_uri("1")
True
>>> is_valid_uri("//")
True
>>> is_valid_uri("http#:/")
True

I think these results are not correct.

Feb 13 '18 07:02 win911

Really this method should be is_valid_uri_reference(), but each of these are valid URI references.

# A no-scheme path with zero extra segments
>>> rfc3986.uri_reference('1')
URIReference(scheme=None, authority=None, path='1', query=None, fragment=None)

# A relative-ref with an empty path and no scheme.
>>> rfc3986.uri_reference('//')
URIReference(scheme=None, authority=None, path=None, query=None, fragment=None)

# A relative-ref with a non-empty path and a fragment.
>>> rfc3986.uri_reference('http#:/')
URIReference(scheme=None, authority=None, path='http', query=None, fragment=':/')

Dec 04 '18 16:12 sethmlarson

Right. There's a far better and more comprehensive way of doing validations

Dec 04 '18 18:12 sigmavirus24

https://rfc3986.readthedocs.io/en/latest/user/validating.html

Dec 04 '18 18:12 sigmavirus24

rfc3986.is_valid_uri("Hi Super Nintendo Chalmers") # -> True
rfc3986.is_valid_uri("My cat's breath smells like cat food.") # -> True
rfc3986.is_valid_uri("I'm learnding!") # -> True

Something is deeply wrong here.

Feb 07 '19 23:02 bugeats

@bugeats All of those examples pass the validation check because URIReference.from_string() does normalization which converts characters within components that can be percent-encoded into their percent-encoded counterparts before calling is_valid() (Meaning all of that text ends up being a percent encoded path). And because there aren't any kwargs to "require" any component in particular we get a result of True.

Using the Validator interface makes this all more clear. You can achieve what I believe you want via:

def is_valid_uri_with_scheme_and_host(uri):
    try:
        uri = rfc3986.URIReference.from_string(uri)
        rfc3986.validators.Validator().require_presence_of("scheme", "host").check_validity_of("scheme", "host").validate(uri)
        return True
    except MissingComponentError, InvalidComponentsError:
        return False

Will return False for all your examples but True for https://www.google.com/ and the like.

Feb 08 '19 01:02 sethmlarson

Yes I eventually figured that out, thanks for the detailed response. I mostly wanted to point out the absurdity of the code as it reads. I humbly suggest that you reconsider the name of that method.

Feb 08 '19 17:02 bugeats

Or just remove it, it's been deprecated for a few releases.

Feb 08 '19 18:02 sethmlarson

I want to use this library to detect foo bar (with a space) as an invalid URI with regard to RFC 3986. All parts of the API of this library that I have seen so far auto-correct foo bar to foo%20bar for me. How can I disable that behavior and get an exception, instead? What class and functions can I use for strict parsing behavior if is_valid_uri and .is_valid are not strict? Thanks!

Sep 13 '19 21:09 hartwork

Anyone?

Feb 27 '22 17:02 hartwork

Almost a year late to the party here.

If you don't mind using something home-grown, you could try using it. I went through RFC3986 and converted all the ABNF to regexes. I left the ABNF in comments above the regexes so you can verify the code for yourself.

Jan 29 '23 03:01 kenballus

rfc3986
rfc3986 copied to clipboard

Incorrect result returned by is_valid_uri()

rfc3986 rfc3986 copied to clipboard

Incorrect result returned by is_valid_uri()

rfc3986
rfc3986 copied to clipboard