memphis icon indicating copy to clipboard operation
memphis copied to clipboard

Bug: Schemaverse misinterprets JSON example as schema

Open rnowling-memphis opened this issue 1 year ago • 8 comments

Describe the bug

The schemaverse UI has an option for generating a schema from an example JSON document. After trying it with a simple example, I pasted the following output from Debezium into the UI to try it on a more complex document:

{
    "schema": {
        "type": "struct",
        "fields": [
            {
                "type": "string",
                "optional": true,
                "name": "io.debezium.data.Json",
                "version": 1,
                "field": "before"
            },
            {
                "type": "string",
                "optional": true,
                "name": "io.debezium.data.Json",
                "version": 1,
                "field": "after"
            },
            {
                "type": "struct",
                "fields": [
                    {
                        "type": "array",
                        "items": {
                            "type": "string",
                            "optional": false
                        },
                        "optional": true,
                        "field": "removedFields"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "name": "io.debezium.data.Json",
                        "version": 1,
                        "field": "updatedFields"
                    },
                    {
                        "type": "array",
                        "items": {
                            "type": "struct",
                            "fields": [
                                {
                                    "type": "string",
                                    "optional": false,
                                    "field": "field"
                                },
                                {
                                    "type": "int32",
                                    "optional": false,
                                    "field": "size"
                                }
                            ],
                            "optional": false,
                            "name": "io.debezium.connector.mongodb.changestream.truncatedarray",
                            "version": 1
                        },
                        "optional": true,
                        "field": "truncatedArrays"
                    }
                ],
                "optional": true,
                "name": "io.debezium.connector.mongodb.changestream.updatedescription",
                "version": 1,
                "field": "updateDescription"
            },
            {
                "type": "struct",
                "fields": [
                    {
                        "type": "string",
                        "optional": false,
                        "field": "version"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "connector"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "name"
                    },
                    {
                        "type": "int64",
                        "optional": false,
                        "field": "ts_ms"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "name": "io.debezium.data.Enum",
                        "version": 1,
                        "parameters": {
                            "allowed": "true,last,false,incremental"
                        },
                        "default": "false",
                        "field": "snapshot"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "db"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "sequence"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "rs"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "collection"
                    },
                    {
                        "type": "int32",
                        "optional": false,
                        "field": "ord"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "lsid"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "txnNumber"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "wallTime"
                    }
                ],
                "optional": false,
                "name": "io.debezium.connector.mongo.Source",
                "field": "source"
            },
            {
                "type": "string",
                "optional": true,
                "field": "op"
            },
            {
                "type": "int64",
                "optional": true,
                "field": "ts_ms"
            },
            {
                "type": "struct",
                "fields": [
                    {
                        "type": "string",
                        "optional": false,
                        "field": "id"
                    },
                    {
                        "type": "int64",
                        "optional": false,
                        "field": "total_order"
                    },
                    {
                        "type": "int64",
                        "optional": false,
                        "field": "data_collection_order"
                    }
                ],
                "optional": true,
                "name": "event.block",
                "version": 1,
                "field": "transaction"
            }
        ],
        "optional": false,
        "name": "tutorial.todo_application.todo_items.Envelope"
    },
    "payload": {
        "before": null,
        "after": "{\"_id\": {\"$oid\": \"645af00af59093471d45c352\"},\"creation_timestamp\": {\"$date\": 1683681290007},\"due_date\": null,\"description\": \"TDEATXVSFKHGTUNPJLKI\",\"completed\": false}",
        "updateDescription": null,
        "source": {
            "version": "2.3.0-SNAPSHOT",
            "connector": "mongodb",
            "name": "tutorial",
            "ts_ms": 1683681290000,
            "snapshot": "false",
            "db": "todo_application",
            "sequence": null,
            "rs": "rs0",
            "collection": "todo_items",
            "ord": 1,
            "lsid": null,
            "txnNumber": null,
            "wallTime": 1683681290009
        },
        "op": "c",
        "ts_ms": 1683681290014,
        "transaction": null
    }
}

The "convert to JSON schema" button is grayed out. Instead, I would expect the button to be clickable and allow me to extract a schema from the example.

Steps to reproduce

  1. Go to the schemaverse tab
  2. Click create new schema
  3. Select the JSON data format
  4. Paste the above example into the Schema Structure box

Affected services

UI

Platforms

Docker

If UI - Browsers

Firefox

Environment

Testing

Additional context

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

rnowling-memphis avatar May 10 '23 01:05 rnowling-memphis

@rnowling-memphis , the button is grayed out because it already has a JSON Schema structure and doesn't need conversation. Correct me if I am wrong @avrhamNeeman . Either way, It should be clearer. @avrhamNeeman Maybe we can catch an event of "Paste", check if it is JSON Schema already, and instead of just graying out the button -> JSON Schema has identified or something like that.

yanivbh1 avatar May 10 '23 06:05 yanivbh1

@yanivbh1 You're right! This is already a JSON Schema structure. Regarding the indication - if you paste something invalid, you will get an error, and we have the Validate button for the additional verification option.

avrhamNeeman avatar May 10 '23 08:05 avrhamNeeman

Okay, so yes, it technically passes the validation with Python jsonschema package. I.e., the following doesn't return an error:

jsonschema.Draft202012Validator.check_schema(schema)

But that's because it's missing the type and property fields. It's interpreted as an empty schema that will pass anything:

jsonschema.validate({"firstname" : "RJ", "lastname" : "Nowling"}, schema)

This obviously isn't what we want.

If we pull out just the schema field and try to validate that, we get an error:

jsonschema.Draft202012Validator.check_schema(schema["schema"])
SchemaError: 'struct' is not valid under any of the given schemas

Failed validating 'anyOf' in metaschema['allOf'][3]['properties']['type']:
    {'anyOf': [{'$ref': '#/$defs/simpleTypes'},
               {'items': {'$ref': '#/$defs/simpleTypes'},
                'minItems': 1,
                'type': 'array',
                'uniqueItems': True}]}

On schema['type']:
    'struct'

This is because what Debezium produces is not a JSONSchema-compatible schema description.

Maybe we need to differ from the JSON schema specification by not allowing an empty or true schema that validates everything? What would be the point of allowing such a schema? In that case, the user should just turn Schemaverse off.

rnowling-memphis avatar May 10 '23 09:05 rnowling-memphis

Adding @idanasulinmemphis and @avrhamNeeman . That's definitely not the desired behaviour

yanivbh1 avatar May 10 '23 10:05 yanivbh1

@rnowling-memphis, thanks for the clarification! We got the issue - every JSON structure will pass, even if not a JSON schema structure. We need to change the validation logic, which will only pass the validation if it's JSON schema structure.

avrhamNeeman avatar May 10 '23 11:05 avrhamNeeman

Thanks for your attention @rnowling-memphis. Json object is valid in json schema terms. so if I create schema for example: {"test":"test"} - this is a valid json schema but this schema does not have validation rules or constraints so each message can pass and you will not get errors. If you create json schema in this structure for example: { "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "test": { "type": "string" } }, "required": ["test"] }

You will get errors according to the constraints of schema.

shohamroditimemphis avatar May 10 '23 14:05 shohamroditimemphis

@shohamroditimemphis I agree. I just don't think that's a great user experience. I think it should only accept a subset of valid schemas -- those that will actually do filtering. But it may be too difficult to accomplish.

rnowling-memphis avatar May 10 '23 14:05 rnowling-memphis

I definitely agree with you @rnowling-memphis. We will do our best to make this fixed.

shohamroditimemphis avatar May 10 '23 14:05 shohamroditimemphis