referencing icon indicating copy to clipboard operation
referencing copied to clipboard

Resolving a `$ref` to a nested definition can fail to catch that a (bad) subschema is neither bool nor object, resulting in an `AttributeError`

Open sirosen opened this issue 5 months ago • 4 comments

This originates downstream in https://github.com/python-jsonschema/check-jsonschema/issues/376 . I was just able to look at the offending schema and trace things out enough to find the true cause. I've stuck with using the CLI for my reproducers, but I could work up some python for this if it would be useful.

check-jsonschema is using jsonschema+referencing. It keeps the jsonschema behavior of checking a schema against its metaschema. Therefore, the following schema is caught as invalid:

invalid-caught-by-metaschema
{
  "$schema": "http://json-schema.org/draft-07/schema",
  "type": "object",
  "definitions": {
    "foo": "invalid"
  },
  "properties": {
    "foo": {
      "$ref": "#/definitions/foo"
    }
  }
}

(Note how the definition for "foo" is a string, rather than a schema.)


However, if the invalid definition is nested, as follows, the metaschema check is not sufficient:

invalid-not-caught-by-metaschema
{
  "$schema": "http://json-schema.org/draft-07/schema",
  "type": "object",
  "definitions": {
    "sub": {
      "foo": "invalid"
    }
  },
  "properties": {
    "foo": {
      "$ref": "#/definitions/sub/foo"
    }
  }
}

As a result, it is possible, with jsonschema+referencing, to be handling this in a validator. The next step is to try to use it. With the check-jsonschema CLI, we get the following trace (trimmed to relevant parts):

full-cli-traceback
$ check-jsonschema --schemafile badrefschema.json <(echo '{"foo": "bar"}')  
Traceback (most recent call last):
  
  <<<<<check-jsonschema invocation path shows here>>>>>

  File "/home/sirosen/projects/jsonschema/check-jsonschema/src/check_jsonschema/checker.py", line 73, in _build_result
    for err in validator.iter_errors(data):
  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 368, in iter_errors
    for error in errors:
  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/_keywords.py", line 295, in properties
    yield from validator.descend(
  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 416, in descend
    for error in errors:
  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/_keywords.py", line 274, in ref
    yield from validator._validate_reference(ref=ref, instance=instance)
  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 410, in descend
    for k, v in applicable_validators(schema):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/_legacy_keywords.py", line 17, in ignore_ref_siblings
    ref = schema.get("$ref")
          ^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'

This shows an error in jsonschema (not quite at referencing yet!). I had to tinker around with things to find the right trigger conditions, but this seems to do the trick, as a nasty schema:

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "type": "object",
  "definitions": {
    "sub": {
      "foo": {
        "type": "object",
        "properties": {
          "bar": "invalid"
        }
      }
    }
  },
  "properties": {
    "foo": {
      "$ref": "#/definitions/sub/foo"
    }
  }
}

Now, trying to use it triggers an attempt to call .get() on a string:

$ check-jsonschema --schemafile badrefschema.json <(echo '{"foo": {"bar": "baz"}}')

(trimmed trace)

  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 408, in descend
    resolver = self._resolver.in_subresource(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/referencing/_core.py", line 689, in in_subresource
    id = subresource.id()
         ^^^^^^^^^^^^^^^^
  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/referencing/_core.py", line 225, in id
    id = self._specification.id_of(self.contents)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/referencing/jsonschema.py", line 50, in _legacy_dollar_id
    id = contents.get("$id")
         ^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'

Proposed Resolution

Obviously, the best resolution is for such a schema to never get written, since it's invalid. :grin: And I'll be trying to advise the upstream provider of the schema against nesting things under definitions, since it weakens the kind of checking which is possible.

But should various referencing internals be more wary that an input may be a dict, a bool, or something unwanted/unexpected? Or should referencing check for bad inputs more aggressively when loading data? The goal here is to have a better and clearer error in this case, not to ignore the malformed schema.

I think the best thing here is that one of the levels of document loading in Referencing checks if the value is dict | boolean. If we only consider JSON Schema, it could be a validator on Resource.contents. That makes the most sense to me, conceptually, for JSON Schema, but it ties in non-generic details of that spec (vs more general $ref resolution). Perhaps JSONSchemaResource(Resource) is the right thing, which adds said validation?

(NB: I'm reading a lot of the internals pretty quickly and for the first time, so my ideas here may not hold up.)

sirosen avatar Jan 17 '24 00:01 sirosen