jsonschema
jsonschema copied to clipboard
hostname format checker might raise ValueError
At least on empty string format: hostname
validator raises ValueError
and not the expected ValidationError
.
>>> from jsonschema.validators import validator_for
>>> schema = {"$schema": "https://json-schema.org/draft/2020-12/schema", "type": "string", "format": "hostname"}
>>> vcls = validator_for(schema)
>>> validator = vcls(schema, format_checker=vcls.FORMAT_CHECKER)
>>> validator.validate("")
....
for error in errors:
File ".../jsonschema/_validators.py", line 238, in format
validator.format_checker.check(instance, format)
File ".../jsonschema/_format.py", line 135, in check
result = func(instance)
^^^^^^^^^^^^^^
File ".../jsonschema/_format.py", line 276, in is_host_name
return FQDN(instance).is_valid
^^^^^^^^^^^^^^
File ".../fqdn/__init__.py", line 44, in __init__
raise ValueError("fqdn must be str")
ValueError: fqdn must be str
This also fails on iter_errors()
usage.
This might be enough to fix it:
--- a/jsonschema/_format.py
+++ b/jsonschema/_format.py
@@ -270,6 +270,7 @@ with suppress(ImportError):
draft7="hostname",
draft201909="hostname",
draft202012="hostname",
+ raises=ValueError,
)
def is_host_name(instance: object) -> bool:
if not isinstance(instance, str):
Sorry, no time to properly test + send a PR at this point in time.
Thanks, indeed the first step here is to add a test to the official test suite. Help welcome whenever you care to.
Discussion in https://github.com/json-schema-org/JSON-Schema-Test-Suite/pull/677 pointed out that zero length strings are actually valid in the scope of given RFC's. So, my initial easy PR needs a bit more work.
As the zero length / empty string is valid, also idn-hostname
fails in at least the following scenarios (using the underlying idna
library for some quick checks):
>>> import idna
>>> idna.encode("")
idna.core.IDNAError: Empty domain
>>> idna.encode(".")
idna.core.IDNAError: Empty Label
To be continued...
@Julian I think the easiest patch for now is to do a fast pre-check / return True
on special cases on jsonschema
, before calling the external libraries.
Something like
if instance in ("", "."):
return True
At the moment idna
raises exceptions on those too legit cases, and fqdn
gives out the ValueError
.
Even with that fix I think it makes sense to add raises=ValueError
to the hostname
/ fqdn
check.
:thinking:
I think the easiest patch for now is to do a fast pre-check
Yep that seems right to me.
Even with that fix I think it makes sense to add raises=ValueError to the hostname / fqdn check.
I'm somewhat OK with this but can you try to read through the source code of those libs and see whether there appear to be any explicit code paths which will really hit that? As I say I'm probably OK adding it regardless, but it's a bit nice to be thorough while we're here.
And then there is the Python standard lib idna
encoding... oh my :exploding_head:
>>> "hello".encode("idna")
b'hello'
>>> "hello.".encode("idna")
b'hello.'
>>> "".encode("idna")
b''
>>> ".".encode("idna")
Traceback (most recent call last):
File "lib/python3.11/encodings/idna.py", line 163, in encode
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)
I need a short break from all this... maybe resuming tomorrow.
I have encountered this regression when upgrading from 3.0.1 to 4.21.1 (a big jump as older versions don't support python 3.12) Where the schema includes:
"address": {
"type": "string",
"anyOf": [
{
"format": "hostname"
},
{
"format": "ipv4"
},
{
"format": "ipv6"
}
]
},
Previously passing an empty string would have resulted in a 'format' error, it now leaks the ValueError exception instead of being able to enumerate all the errors in the document being validated.
errors = list(self.VALIDATOR.iter_errors(spec))
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:371: in iter_errors
for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:340: in anyOf
errs = list(validator.descend(instance, subschema, schema_path=index))
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:226: in format
validator.format_checker.check(instance, format)
tests\.virtualenv\Lib\site-packages\jsonschema\_format.py:137: in check
result = func(instance)
tests\.virtualenv\Lib\site-packages\jsonschema\_format.py:277: in is_host_name
return FQDN(instance, min_labels=1).is_valid
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def __init__(self, fqdn, *nothing, **kwargs):
if nothing:
raise ValueError("got extra positional parameter, try kwargs")
unknown_kwargs = set(kwargs.keys()) - {"allow_underscores", "min_labels"}
if unknown_kwargs:
raise ValueError("got extra kwargs: {}".format(unknown_kwargs))
if not (fqdn and isinstance(fqdn, str)):
> raise ValueError("fqdn must be str")
E ValueError: fqdn must be str
I've written a longer comment on the tests as to what I think the behaviour should be, but not "crashing" (leaking exceptions) when enumerating errors would be a good start here.
I believe that https://github.com/ypcrts/fqdn/issues/45 is the root cause, but catching a ValueError defensively here is probably the right thing to do.
Thanks for filing that. Yeah I'm happy with catching the ValueError
in the interim (but I'd wait till the test is merged upstream).