CDC-Tools
CDC-Tools copied to clipboard
transactional mode
could you possibly explain this - many thanks :)
Check my blog post: https://jack-vanlightly.com/blog/2018/4/28/sql-server-cdc-to-redshift-pipeline It explains the different strategies that either respect global or table level transactional boundaries and those that simply stream changes as they occur. Reading from a transaction log directly would make the grouping of changes of a given transaction easy (if you can decode the log), but SQL Server CDC makes changes available at a table level. I wanted to experiment with solutions that could export changes of the same db transaction together, regardless of whether the transaction crossed multiple tables or if the transaction was very large.
Let me know if that blog post adequately explains it.
Thanks for that! Yes I see what you mean now.
Might another approach be to put insert triggers on all the CDC tables - converting their contents to json and inserting them into one common table. Then you'd only need to scan that rather than each cdc table.
I have a hundred or so tables I want to CDC...
It is a tempting idea as it makes the solution simpler, but it does have two drawbacks:
-
You'll lose the natural order of the events. From my experiments I find that changes get written to the CDC tables in batches, losing the true total ordering of the transaction log. I did an experiment with triggers and find that also, each batch gets written in reverse order. You could try to reorder the changes in a database query but that could prove difficult.
-
With CDC enabled we are already adding extra load on the server. Each write ends up written again to a CDC table. This trigger technique involves another write again and in JSON format potentially increasing overall write load dramatically.
If you experiment with the trigger idea let me know how it goes.
It's a shame that microsoft don't make available a better programmatic interface to the transaction log - then one could roll one's own cdc system.
I've never used them, but products such as Attunity and Debezium read the transaction log directly, they might offer some functionality that you need.
I've made a bit of progress - and have implemented an app to stream cdc using node js.
I'll put it on github and link you to it but essentially..
1) maxLsn = getMaxLsn
2) if maxLsn has changed from last time
3) for all tables - get changes between last lsn obtained for that table and maxLsn
4) now we can publish changes + we can order them across all tables
5) store the latest lsn and seqval obtained for each table
6) go back to 1)
There are two improvements..
- when there are no changes all we do is step 1) so hardly any impact on database.
- in step 4) we have all the changes for all the tables up to maxLsn - s we can order them and send them to a single topic - or send them related to maxLsn - and also send maxLsn event - so we could order them on kafka.
This is it.. https://github.com/gilesbradshaw/node-mssql-cdc-kafka
It's not currently ordering across tables - but the publish step could..
You are no longer developing the project, which is unfortunate