ksql
ksql copied to clipboard
fix: Use schema_id while serializing keySchema during INSERT
Description
- Previously, we were converting the parsed ksql schema and parsed SR schema and then doing a canonical strict string comparison. This would fail if the SR schema contained additional metadata fields(eg: doc, connect.name).
- As an example the following SR parsed schema and the KSQL parsed schema are the same except for the metadata field
- SR parsed schema string
{"type":"record","name":"MyRecord","namespace":"io.xyz.records","fields":[{"name":"k0","type":["null","string"],"default":null}]}
- Ksql parsed schema string
{"type":"record","name":"KsqlDataSourceSchema","namespace":"io.confluent.ksql.avro_schemas","fields":[{"name":"k0","type":["null","string"],"default":null}]}
- SR parsed schema string
- In the above example, both the schemas have the same fields(name and type matches) and in the same order. However, the metadata fields such as
name
andnamespace
differ. - This PR alters this behavior by not doing a strict canonical string comparison. Instead, it ignores the metadata fields and only does a field based compatibility check
- Compatibility is checked as follows:
- No of columns in parsed ksql schema should match the SR parsed schema
- Order of columns along with the name and type of columns must match
Testing done
Updated Unit Tests TODO: Integeration Tests (@aliehsaeedii has added a new integration test for Insert. I plan to add a few tests to it.)
Reviewer checklist
- [ ] Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
- [ ] Ensure relevant issues are linked (description should include text like "Fixes #
")
Existing RQTT - https://github.com/confluentinc/ksql/blob/master/ksqldb-functional-tests/src/test/resources/rest-query-validation-tests/insert-values.json
Without the change(master branch):
ksql> insert into input_master(ROWKEY, V1) values(struct(K1:=6, K2:=7.0), 1);
Failed to insert values into 'INPUT_MASTER'. Cannot INSERT VALUES into data source `INPUT_MASTER`. ksqlDB generated schema would overwrite existing key schema.
Existing Schema: {"type":"record","name":"sampleRecord","namespace":"com.mycorp.mynamespace","doc":"Sample schema to help you get started.","fields":[{"name":"K1","type":"int","doc":"The int type is a 32-bit signed integer."},{"name":"K2","type":"double","doc":"The double type is a double precision (64-bit) IEEE 754 floating-point number."}]}
ksqlDB Generated: {"type":"record","name":"InputMasterKey","namespace":"io.confluent.ksql.avro_schemas","fields":[{"name":"K1","type":["null","int"],"default":null},{"name":"K2","type":["null","double"],"default":null}],"connect.name":"io.confluent.ksql.avro_schemas.InputMasterKey"}
With the change(current branch):
ksql> insert into input_master(ROWKEY, V1) values(struct(K1:=6, K2:=7.0), 1);
ksql> select * from input_master;
+---------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------+
|ROWKEY |V1 |
+---------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------+
|{K1=6, K2=7.0} |1 |
Query Completed
Query terminated