karapace icon indicating copy to clipboard operation
karapace copied to clipboard

fix: Schema registry message key field format correction

Open jjaakola-aiven opened this issue 2 years ago • 2 comments

About this change - What it does

References: #347

Confluent Schema Registry message key uses field order of keytype, subject, version, magic and Karapace has used order of subject, version, magic, keytype. Kafka won't compact the topic correctly as the key field byte contents are different.

Confluent Schema Registry uses the most compact format of the JSON in the key, for example {"keytype":"SCHEMA","subject":"foo","version":1,"magic":0}. There is no whitespaces, indent nor linebreaks.

Why this way

First approach On Karapace startup the schema message topic is consumed and each key format checked. If format is not correct a tombstone message and corrected message is sent to the topic.

I used the approach @hackaugusto suggested in the ticket but decided to use the schema reader thread. The limitation is that it cannot call the schema writing functions as those will trigger offset watching and would cause deadlock. Another issue is with registries that have a lot of records and this has a significant memory pressure as invalid entries are first collected in memory and then processed. This was rejected and next approach taken.

Second approach, current solution The correction processing is moved to background thread which is started when Karapace leader is ready. The correction is run per record and consumed from start. There is a special marker record sent to schemas topic when correction is run. This allows to identify if correction has already been run.

Notes

  • Backup restore could also correct the keys for records.

jjaakola-aiven avatar Jul 26 '22 07:07 jjaakola-aiven

Deploy Preview for taupe-pudding-f2e102 failed.

Name Link
Latest commit 66246dc54c81afac74deb83e4bc395d0ee2dda1c
Latest deploy log https://app.netlify.com/sites/taupe-pudding-f2e102/deploys/62ed06d5af5d8d0008089d61

netlify[bot] avatar Aug 01 '22 11:08 netlify[bot]

I think KeyFormatCorrector could be made slightly simpler by removing the flag whether the correction has been already run from that. KeyFormatCorrector could do exactly what its name state, correct the keys. And whether or not, the correction has been run should be outside of KeyFormatCorrector itself.

In practise

  • key_correction_done flag would not exist.
  • If told start, it would start (if not running already). It's the problem of the caller if it has already been run.
  • The marker handling (_send_key_correction_done_marker) would not be in KeyFormatCorrector but in the caller side.
  • When the KeyFormatCorrector is ready, it'd tell using a callback or some other mechanism to its "owner" that it's ready. That entity would then write the marker, i.e. call _send_key_correction_done_marker.

The only problem I can see is that the caller would use a different Kafka producer to write the marker. And that could at least fail.

juha-aiven avatar Aug 17 '22 12:08 juha-aiven

Closing this. The live correction would be too risky and better to be done during the restore of the backup: https://github.com/aiven/karapace/blob/f0406b18864fd016b9ae13cc0030bc2c87599666/karapace/schema_backup.py#L293-L295

jjaakola-aiven avatar Mar 02 '23 07:03 jjaakola-aiven