jsonschema icon indicating copy to clipboard operation
jsonschema copied to clipboard

hostname format checker might raise ValueError

Open jvtm opened this issue 1 year ago • 8 comments

At least on empty string format: hostname validator raises ValueError and not the expected ValidationError.

>>> from jsonschema.validators import validator_for
>>> schema = {"$schema": "https://json-schema.org/draft/2020-12/schema", "type": "string", "format": "hostname"}
>>> vcls = validator_for(schema)
>>> validator = vcls(schema, format_checker=vcls.FORMAT_CHECKER)
>>> validator.validate("")
....
    for error in errors:
  File ".../jsonschema/_validators.py", line 238, in format
    validator.format_checker.check(instance, format)
  File ".../jsonschema/_format.py", line 135, in check
    result = func(instance)
             ^^^^^^^^^^^^^^
  File ".../jsonschema/_format.py", line 276, in is_host_name
    return FQDN(instance).is_valid
           ^^^^^^^^^^^^^^
  File ".../fqdn/__init__.py", line 44, in __init__
    raise ValueError("fqdn must be str")
ValueError: fqdn must be str

This also fails on iter_errors() usage.

This might be enough to fix it:

--- a/jsonschema/_format.py
+++ b/jsonschema/_format.py
@@ -270,6 +270,7 @@ with suppress(ImportError):
         draft7="hostname",
         draft201909="hostname",
         draft202012="hostname",
+        raises=ValueError,
     )
     def is_host_name(instance: object) -> bool:
         if not isinstance(instance, str):

Sorry, no time to properly test + send a PR at this point in time.

jvtm avatar Jul 10 '23 18:07 jvtm

Thanks, indeed the first step here is to add a test to the official test suite. Help welcome whenever you care to.

Julian avatar Jul 11 '23 12:07 Julian

Discussion in https://github.com/json-schema-org/JSON-Schema-Test-Suite/pull/677 pointed out that zero length strings are actually valid in the scope of given RFC's. So, my initial easy PR needs a bit more work.

As the zero length / empty string is valid, also idn-hostname fails in at least the following scenarios (using the underlying idna library for some quick checks):

>>> import idna
>>> idna.encode("")
idna.core.IDNAError: Empty domain
>>> idna.encode(".")
idna.core.IDNAError: Empty Label

To be continued...

jvtm avatar Jul 13 '23 12:07 jvtm

@Julian I think the easiest patch for now is to do a fast pre-check / return True on special cases on jsonschema, before calling the external libraries.

Something like

if instance in ("", "."):
    return True

At the moment idna raises exceptions on those too legit cases, and fqdn gives out the ValueError.

Even with that fix I think it makes sense to add raises=ValueError to the hostname / fqdn check.

:thinking:

jvtm avatar Jul 13 '23 12:07 jvtm

I think the easiest patch for now is to do a fast pre-check

Yep that seems right to me.

Even with that fix I think it makes sense to add raises=ValueError to the hostname / fqdn check.

I'm somewhat OK with this but can you try to read through the source code of those libs and see whether there appear to be any explicit code paths which will really hit that? As I say I'm probably OK adding it regardless, but it's a bit nice to be thorough while we're here.

Julian avatar Jul 13 '23 12:07 Julian

And then there is the Python standard lib idna encoding... oh my :exploding_head:

>>> "hello".encode("idna")
b'hello'

>>> "hello.".encode("idna")
b'hello.'

>>> "".encode("idna")
b''

>>> ".".encode("idna")
Traceback (most recent call last):
  File "lib/python3.11/encodings/idna.py", line 163, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

I need a short break from all this... maybe resuming tomorrow.

jvtm avatar Jul 13 '23 12:07 jvtm

I have encountered this regression when upgrading from 3.0.1 to 4.21.1 (a big jump as older versions don't support python 3.12) Where the schema includes:

"address": {
    "type": "string",
    "anyOf": [
        {
            "format": "hostname"
        },
        {
            "format": "ipv4"
        },
        {
            "format": "ipv6"
        }
    ]
},

Previously passing an empty string would have resulted in a 'format' error, it now leaks the ValueError exception instead of being able to enumerate all the errors in the document being validated.

    errors = list(self.VALIDATOR.iter_errors(spec))
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:371: in iter_errors
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
    yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
    yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
    yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
    yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:340: in anyOf
    errs = list(validator.descend(instance, subschema, schema_path=index))
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:226: in format
    validator.format_checker.check(instance, format)
tests\.virtualenv\Lib\site-packages\jsonschema\_format.py:137: in check
    result = func(instance)
tests\.virtualenv\Lib\site-packages\jsonschema\_format.py:277: in is_host_name
    return FQDN(instance, min_labels=1).is_valid
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _


    def __init__(self, fqdn, *nothing, **kwargs):
        if nothing:
            raise ValueError("got extra positional parameter, try kwargs")
        unknown_kwargs = set(kwargs.keys()) - {"allow_underscores", "min_labels"}
        if unknown_kwargs:
            raise ValueError("got extra kwargs: {}".format(unknown_kwargs))

        if not (fqdn and isinstance(fqdn, str)):
>           raise ValueError("fqdn must be str")
E           ValueError: fqdn must be str

I've written a longer comment on the tests as to what I think the behaviour should be, but not "crashing" (leaking exceptions) when enumerating errors would be a good start here.

shane-kearns avatar Mar 04 '24 14:03 shane-kearns

I believe that https://github.com/ypcrts/fqdn/issues/45 is the root cause, but catching a ValueError defensively here is probably the right thing to do.

shane-kearns avatar Mar 04 '24 18:03 shane-kearns

Thanks for filing that. Yeah I'm happy with catching the ValueError in the interim (but I'd wait till the test is merged upstream).

Julian avatar Mar 04 '24 18:03 Julian