Routine Load delete.on.null
When Avro format data is loaded into StarRocks via Routine Load, data with empty values will be deleted.
{"paas_id":101810180018513602} | null
paas_id is the primary key, and when the value is empty, this data should be deleted.
I tried to set the __op field, but when the value is null, the __op field does not take effect.
@adonis-lau I can't understand clearly.
If pass_id is empty or null, what do you want to do? Delete a row? delete which row?
If you use __op=1 to indicate a deleteion, you mush specify the primary key, to let SR know which row will be deleted.
@jaogoy It should be that I didn't express it clearly.
TiDB's TiCDC tool transmits changed data to Kafka, using Avro for serialization.
Then I used the kafka-avro-console-consumer command to view Avro data in Kafka.
kafka-avro-console-consumer --bootstrap-server kafka_hostname:9092 \
--topic topic_name \
--property schema.registry.url=http://registry_hostname:8081 \
--property print.key=true \
--property key.separator=" | " \
--from-beginning
| prefix is key, the latter part is the value
Example data for insert or update:
{"paas_id":101810180018513602} | {"flow_file_id":{"string":"0dd3034e1bfe"},"execute_ym":{"string":"2025-04"},"business_ym":{"string":"2025-03"},"collect_time":{"string":"2025-04-01 09:11:49"},"collect_app_time":null,"data_source":{"string":"1"},"plan_id":{"string":"b09cd398623ae1211"},"paas_is_disable":{"string":"0"},"paas_create_user":null,"paas_create_time":{"string":"2025-04-01 09:11:49"},"paas_update_user":null,"paas_update_time":{"string":"2025-04-01 09:11:49"},"paas_version_no":{"string":""},"paas_is_del":{"string":"0"},"paas_id":101810180018513602,"paas_dynamic_fields":{"string":"{\"distributor_upper\": \"\", \"json_last_update_time\": \"2025-06-06 19:15:05\", \"sdp_code\": \"00000179\"}"},"data_file_id":{"string":"be01ce47fd08"},"is_valid":{"string":""},"_tidb_op":"u","_tidb_commit_ts":458739798905192454,"_tidb_commit_physical_time":1749953456517}
Example data for delete:
{"paas_id":101810180018513602} | null
The first data can be inserted or updated normally, but the second data cannot be deleted.
Message two is Kafka's tombstone message. https://kafka.apache.org/documentation/#org.apache.kafka.connect.transforms.predicates.RecordIsTombstone https://forum.confluent.io/t/kafka-topic-and-tombstone-messages/3985
currently, parsing message key is not supported.
@wyb thank you.
Will you consider supporting it in the future?
I think it is needed. But, currently, it's not a high priority thing. You can store the pass_id as a item in value object. then use another item as indicator whether this message should be deleted from db or added into db.