transactional-datalake-using-apache-iceberg-on-aws-glue icon indicating copy to clipboard operation
transactional-datalake-using-apache-iceberg-on-aws-glue copied to clipboard

How to handle multiple tables from database source

Open bcolas opened this issue 2 years ago • 2 comments

Hi Any tips to use this example in the real case we have multiple tables to synchronize from the database source ? Thanks for your help Benoit COLAS

bcolas avatar Jul 20 '23 08:07 bcolas

Can someone please respond to this man?

NinoSkopac avatar Jan 23 '24 16:01 NinoSkopac

There are two ways to handle multiple tables from a database source. First, you can replicate these data pipelines for each table. The other way is to set up AWS DMS to read CDC data from multiple tables (for more information, see AWS DMS - Wildcards in table mapping) and have AWS Glue Streaming Job upsert streaming CDC data into multiple Apache Iceberg tables.

AWS DMS allows you to get binlogs from many tables under a single database. Then, if you have the Glue Streaming Job script repeat upserting for each source database table, you can handle multiple tables.

ksmin23 avatar Jan 25 '24 04:01 ksmin23