[π Feature]: [py] Validate URL's before navigation
Description
When you call driver.get() or driver.browsing_context.navigate()), it attempts to navigate to a URL, even if the URL is malformed.
Browsers don't handle this very well. For example...
if you do:
driver.get("example.com")
or
driver.get("http//example.com")
Chrome will just not navigate and not return any error (Firefox returns an error).
Proposed change:
If we validate the URL before attempting navigation, we can raise a useful exception: raise InvalidArgumentException("Invalid URL").
Here is some example code for validation:
from urllib.parse import urlparse
def is_valid_url(url):
try:
result = urlparse(url)
return bool(result.scheme)
except AttributeError:
return False
This validates it can be parsed as a URL and contains a scheme.
Have you considered any alternatives or workarounds?
No response
Does this apply to specific language bindings?
Python
What part(s) of Selenium does this relate to?
No response
@cgoldberg, thank you for creating this issue. We will troubleshoot it as soon as we can.
Selenium Triage Team: remember to follow the Triage Guide
This is what the spec says:
If URL is not an absolute URL or is not an absolute URL with fragment or not a local scheme, return error with error code invalid argument.
Technically users can create valid local schemes of their own that aren't in VALID_URL_SCHEMES so I don't think we should prevent those. Verify fragments (https://www.example.com/documentation.html#installation) and queries (https://www.example.com/documentation.html?foo=bar) pass the parse (I suspect they do)
users can create valid local schemes of their own that aren't in VALID_URL_SCHEMES
So I guess we can't validate the URL scheme.. maybe it should just try to parse the URL and verify it has a scheme and netloc and let everything else through.
The validation would just be:
def is_valid_url(url):
try:
result = urlparse(url)
return all([result.scheme, result.netloc])
except AttributeError:
return False
about:blank is a valid url within Chrome but fails the check above.
@emanlove thanks.. you're right.
We could use:
def is_valid_url(url):
try:
result = urlparse(url)
return bool(result.scheme)
except AttributeError:
return False
... that's not a lot of validation, but it would save users from being confused when driver.get(example.com) doesn't navigate or raise an exception in Chrome/Edge.
I'm not sure this is even worth doing though.
I think it would be better to leave that decision to end user implementing selenium. But this could make a good blog or knowledge article for community.
Now that weβre moving to bidi default in Selenium 5 we could possibly check for error codes in driver .get()