karapace
karapace copied to clipboard
fix: Schema registry message key field format correction
About this change - What it does
References: #347
Confluent Schema Registry message key uses field order of keytype, subject, version, magic
and Karapace has used order of subject, version, magic, keytype
. Kafka won't compact the topic correctly as the key field byte contents are different.
Confluent Schema Registry uses the most compact format of the JSON in the key, for example {"keytype":"SCHEMA","subject":"foo","version":1,"magic":0}
. There is no whitespaces, indent nor linebreaks.
Why this way
First approach On Karapace startup the schema message topic is consumed and each key format checked. If format is not correct a tombstone message and corrected message is sent to the topic.
I used the approach @hackaugusto suggested in the ticket but decided to use the schema reader thread. The limitation is that it cannot call the schema writing functions as those will trigger offset watching and would cause deadlock. Another issue is with registries that have a lot of records and this has a significant memory pressure as invalid entries are first collected in memory and then processed. This was rejected and next approach taken.
Second approach, current solution The correction processing is moved to background thread which is started when Karapace leader is ready. The correction is run per record and consumed from start. There is a special marker record sent to schemas topic when correction is run. This allows to identify if correction has already been run.
Notes
- Backup restore could also correct the keys for records.
Deploy Preview for taupe-pudding-f2e102 failed.
Name | Link |
---|---|
Latest commit | 66246dc54c81afac74deb83e4bc395d0ee2dda1c |
Latest deploy log | https://app.netlify.com/sites/taupe-pudding-f2e102/deploys/62ed06d5af5d8d0008089d61 |
I think KeyFormatCorrector
could be made slightly simpler by removing the flag whether the correction has been already run from that. KeyFormatCorrector
could do exactly what its name state, correct the keys. And whether or not, the correction has been run should be outside of KeyFormatCorrector
itself.
In practise
-
key_correction_done
flag would not exist. - If told start, it would start (if not running already). It's the problem of the caller if it has already been run.
- The marker handling (
_send_key_correction_done_marker
) would not be inKeyFormatCorrector
but in the caller side. - When the
KeyFormatCorrector
is ready, it'd tell using a callback or some other mechanism to its "owner" that it's ready. That entity would then write the marker, i.e. call_send_key_correction_done_marker
.
The only problem I can see is that the caller would use a different Kafka producer to write the marker. And that could at least fail.
Closing this. The live correction would be too risky and better to be done during the restore of the backup: https://github.com/aiven/karapace/blob/f0406b18864fd016b9ae13cc0030bc2c87599666/karapace/schema_backup.py#L293-L295