ksql icon indicating copy to clipboard operation
ksql copied to clipboard

fix: Use schema_id while serializing keySchema during INSERT

Open bvarghese1 opened this issue 1 year ago • 1 comments

Description

  • Previously, we were converting the parsed ksql schema and parsed SR schema and then doing a canonical strict string comparison. This would fail if the SR schema contained additional metadata fields(eg: doc, connect.name).
  • As an example the following SR parsed schema and the KSQL parsed schema are the same except for the metadata field
    • SR parsed schema string {"type":"record","name":"MyRecord","namespace":"io.xyz.records","fields":[{"name":"k0","type":["null","string"],"default":null}]}
    • Ksql parsed schema string {"type":"record","name":"KsqlDataSourceSchema","namespace":"io.confluent.ksql.avro_schemas","fields":[{"name":"k0","type":["null","string"],"default":null}]}
  • In the above example, both the schemas have the same fields(name and type matches) and in the same order. However, the metadata fields such as name and namespace differ.
  • This PR alters this behavior by not doing a strict canonical string comparison. Instead, it ignores the metadata fields and only does a field based compatibility check
  • Compatibility is checked as follows:
    • No of columns in parsed ksql schema should match the SR parsed schema
    • Order of columns along with the name and type of columns must match

Testing done

Updated Unit Tests TODO: Integeration Tests (@aliehsaeedii has added a new integration test for Insert. I plan to add a few tests to it.)

Reviewer checklist

  • [ ] Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • [ ] Ensure relevant issues are linked (description should include text like "Fixes #")

bvarghese1 avatar Aug 09 '22 02:08 bvarghese1

Existing RQTT - https://github.com/confluentinc/ksql/blob/master/ksqldb-functional-tests/src/test/resources/rest-query-validation-tests/insert-values.json

bvarghese1 avatar Aug 10 '22 16:08 bvarghese1

Without the change(master branch):

ksql> insert into input_master(ROWKEY, V1) values(struct(K1:=6, K2:=7.0), 1);
Failed to insert values into 'INPUT_MASTER'. Cannot INSERT VALUES into data source `INPUT_MASTER`. ksqlDB generated schema would overwrite existing key schema.
	Existing Schema: {"type":"record","name":"sampleRecord","namespace":"com.mycorp.mynamespace","doc":"Sample schema to help you get started.","fields":[{"name":"K1","type":"int","doc":"The int type is a 32-bit signed integer."},{"name":"K2","type":"double","doc":"The double type is a double precision (64-bit) IEEE 754 floating-point number."}]}
	ksqlDB Generated: {"type":"record","name":"InputMasterKey","namespace":"io.confluent.ksql.avro_schemas","fields":[{"name":"K1","type":["null","int"],"default":null},{"name":"K2","type":["null","double"],"default":null}],"connect.name":"io.confluent.ksql.avro_schemas.InputMasterKey"}

With the change(current branch):

ksql> insert into input_master(ROWKEY, V1) values(struct(K1:=6, K2:=7.0), 1);
ksql> select * from input_master;
+---------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------+
|ROWKEY                                                                                 |V1                                                                                     |
+---------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------+
|{K1=6, K2=7.0}                                                                         |1                                                                                      |
Query Completed
Query terminated

bvarghese1 avatar Aug 10 '22 18:08 bvarghese1