kafka-connect-jdbc
kafka-connect-jdbc copied to clipboard
Foreign Keys not handled
How can I handle foreign key constraints at the sink jdbc connector. Should I just set mode to upsert and it will ignore all messages violating constraints and whenever foreign table rows are inserted this will be inserted after that. Or should I write a caching tool using kafka connect api. Or should I disable foreign key constraints temporarily (but kafka connect will be running all the time). Or should I order message according to topological ordering of the mysql tables.
@adityagupta1089 The connector knows nothing about foreign keys, and this is a pretty difficult issue to tackle. You've already got some good ideas, and hopefully this information will help!
set mode to upsert
I'm not sure how this would change the behavior about foreign key constraints, can you elaborate? Upsert is meant for when you have updates to rows, rather than a stream of new rows in your topic. If you've got a certain foreign key constraint, updating rows vs inserting rows is not going to change whether that constraint is being violated.
write a caching tool using kafka connect api
This is always an option if you find that there's no suitable solution already available. I would encourage you to try and adapt an existing system first, or make some improvement to this connector before spending lots of time on this option though. I'm not sure what this implementation would look like, but it sounds like something that could solve your issue.
disable foreign key constraints temporarily
Yes, you're right that the connector needs to be running continuously to write data. Thus, you would have to leave foreign key constraints disabled all the time for the connector to work. This sounds like a non-solution, but is definitely the easiest to implement.
order message according to topological ordering of the mysql tables
Since you want to have the data land in separate tables, they'll have to be in different topics to start with. Each connector task will make progress on their topic(s) separately, and there's nothing that would prevent one connector from out-pacing the other, except for encountering exceptions.
It might be possible with some clever error handling to have the connector see the foreign key constraint exception and know to back-off on a topic, to give the other connector time to write the rows that the foreign key is referencing. Maybe a collection of tasks, one per topic-partition could use the back-off mechanism to abide by the foreign key constraints.
One alternative solution would be the one proposed in #734 , where one Kafka message could be inserted into multiple different tables with foreign key constraints between them. However, this is a large feature that will require a lot of design work, and would be much more complicated than some of the solutions you've brought up already.
Currently, I stuck with the same problem, and we are not decided to removing the foreign key constraints from the table. To solve this issue, I searched a lot in internet, I didn't find a way to do the same. But some result showing a nested JDBC connector(https://github.com/findinpath/kafka-connect-nested-set-jdbc-sink). When I read the documentation, I didn't understand the content in it. Moreover, I don't know this will solve my issue.