memphis
memphis copied to clipboard
Bug: Schemaverse misinterprets JSON example as schema
Describe the bug
The schemaverse UI has an option for generating a schema from an example JSON document. After trying it with a simple example, I pasted the following output from Debezium into the UI to try it on a more complex document:
{
"schema": {
"type": "struct",
"fields": [
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Json",
"version": 1,
"field": "before"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Json",
"version": 1,
"field": "after"
},
{
"type": "struct",
"fields": [
{
"type": "array",
"items": {
"type": "string",
"optional": false
},
"optional": true,
"field": "removedFields"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Json",
"version": 1,
"field": "updatedFields"
},
{
"type": "array",
"items": {
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "field"
},
{
"type": "int32",
"optional": false,
"field": "size"
}
],
"optional": false,
"name": "io.debezium.connector.mongodb.changestream.truncatedarray",
"version": 1
},
"optional": true,
"field": "truncatedArrays"
}
],
"optional": true,
"name": "io.debezium.connector.mongodb.changestream.updatedescription",
"version": 1,
"field": "updateDescription"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "connector"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": false,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Enum",
"version": 1,
"parameters": {
"allowed": "true,last,false,incremental"
},
"default": "false",
"field": "snapshot"
},
{
"type": "string",
"optional": false,
"field": "db"
},
{
"type": "string",
"optional": true,
"field": "sequence"
},
{
"type": "string",
"optional": false,
"field": "rs"
},
{
"type": "string",
"optional": false,
"field": "collection"
},
{
"type": "int32",
"optional": false,
"field": "ord"
},
{
"type": "string",
"optional": true,
"field": "lsid"
},
{
"type": "int64",
"optional": true,
"field": "txnNumber"
},
{
"type": "int64",
"optional": true,
"field": "wallTime"
}
],
"optional": false,
"name": "io.debezium.connector.mongo.Source",
"field": "source"
},
{
"type": "string",
"optional": true,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "id"
},
{
"type": "int64",
"optional": false,
"field": "total_order"
},
{
"type": "int64",
"optional": false,
"field": "data_collection_order"
}
],
"optional": true,
"name": "event.block",
"version": 1,
"field": "transaction"
}
],
"optional": false,
"name": "tutorial.todo_application.todo_items.Envelope"
},
"payload": {
"before": null,
"after": "{\"_id\": {\"$oid\": \"645af00af59093471d45c352\"},\"creation_timestamp\": {\"$date\": 1683681290007},\"due_date\": null,\"description\": \"TDEATXVSFKHGTUNPJLKI\",\"completed\": false}",
"updateDescription": null,
"source": {
"version": "2.3.0-SNAPSHOT",
"connector": "mongodb",
"name": "tutorial",
"ts_ms": 1683681290000,
"snapshot": "false",
"db": "todo_application",
"sequence": null,
"rs": "rs0",
"collection": "todo_items",
"ord": 1,
"lsid": null,
"txnNumber": null,
"wallTime": 1683681290009
},
"op": "c",
"ts_ms": 1683681290014,
"transaction": null
}
}
The "convert to JSON schema" button is grayed out. Instead, I would expect the button to be clickable and allow me to extract a schema from the example.
Steps to reproduce
- Go to the schemaverse tab
- Click create new schema
- Select the JSON data format
- Paste the above example into the Schema Structure box
Affected services
UI
Platforms
Docker
If UI - Browsers
Firefox
Environment
Testing
Additional context
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@rnowling-memphis , the button is grayed out because it already has a JSON Schema structure and doesn't need conversation. Correct me if I am wrong @avrhamNeeman . Either way, It should be clearer. @avrhamNeeman Maybe we can catch an event of "Paste", check if it is JSON Schema already, and instead of just graying out the button -> JSON Schema has identified or something like that.
@yanivbh1 You're right! This is already a JSON Schema structure. Regarding the indication - if you paste something invalid, you will get an error, and we have the Validate button for the additional verification option.
Okay, so yes, it technically passes the validation with Python jsonschema package. I.e., the following doesn't return an error:
jsonschema.Draft202012Validator.check_schema(schema)
But that's because it's missing the type and property fields. It's interpreted as an empty schema that will pass anything:
jsonschema.validate({"firstname" : "RJ", "lastname" : "Nowling"}, schema)
This obviously isn't what we want.
If we pull out just the schema field and try to validate that, we get an error:
jsonschema.Draft202012Validator.check_schema(schema["schema"])
SchemaError: 'struct' is not valid under any of the given schemas
Failed validating 'anyOf' in metaschema['allOf'][3]['properties']['type']:
{'anyOf': [{'$ref': '#/$defs/simpleTypes'},
{'items': {'$ref': '#/$defs/simpleTypes'},
'minItems': 1,
'type': 'array',
'uniqueItems': True}]}
On schema['type']:
'struct'
This is because what Debezium produces is not a JSONSchema-compatible schema description.
Maybe we need to differ from the JSON schema specification by not allowing an empty or true
schema that validates everything? What would be the point of allowing such a schema? In that case, the user should just turn Schemaverse off.
Adding @idanasulinmemphis and @avrhamNeeman . That's definitely not the desired behaviour
@rnowling-memphis, thanks for the clarification! We got the issue - every JSON structure will pass, even if not a JSON schema structure. We need to change the validation logic, which will only pass the validation if it's JSON schema structure.
Thanks for your attention @rnowling-memphis. Json object is valid in json schema terms. so if I create schema for example: {"test":"test"}
- this is a valid json schema but this schema does not have validation rules or constraints so each message can pass and you will not get errors.
If you create json schema in this structure for example:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"test": {
"type": "string"
}
},
"required": ["test"]
}
You will get errors according to the constraints of schema.
@shohamroditimemphis I agree. I just don't think that's a great user experience. I think it should only accept a subset of valid schemas -- those that will actually do filtering. But it may be too difficult to accomplish.
I definitely agree with you @rnowling-memphis. We will do our best to make this fixed.