debezium-server-iceberg icon indicating copy to clipboard operation
debezium-server-iceberg copied to clipboard

Support (or document) Azure Storage as sink

Open karlschriek opened this issue 2 years ago • 9 comments

I am trying so set up a very simple process to stream CDC records from (Azure) SQL Server as Iceberg Tables to an Azure Storage Account. I've come across various potential solutions to do this, most of which involve chaining various tools together (and using Event Hub at some point).

I find would like to be able to go [SQL Server] ----cdc message----> [Debezium Server] ----iceberg table----> [Azure Blob Storage] instead. Is this possible as of today? If so could we document it somewhere? If not, could we support this?

karlschriek avatar Aug 14 '23 07:08 karlschriek

Hi @karlschriek yes this should be possible with current release(supported). it should be mater of configuring Hadoop FileIO to write Azure Blob and configuring Iceberg Catalog to use azure hive server.

ismailsimsek avatar Aug 14 '23 07:08 ismailsimsek

Ok, that sounds promising. Are there any docs anywhere on how to do something like that? Right now this is the only example config I am able to reference, which is very S3-specific:

https://github.com/memiiso/debezium-server-iceberg/blob/3f0649ae880e9bedd2bdff9e43ca5601bda3da0d/debezium-server-iceberg-sink/src/main/resources/conf/application.properties.example

karlschriek avatar Aug 14 '23 10:08 karlschriek

Hmmm, as far as I can see there are currently two unmerged PRs open that would add ADLS as FileIO, so doesn't look like it is actually supported right now:

  • https://github.com/apache/iceberg/pull/8303
  • https://github.com/apache/iceberg/pull/4465

karlschriek avatar Aug 14 '23 11:08 karlschriek

it is supported with Hadoop file io, i believe this prs are adding more direct Azure Storage integration(Without Hadoop libraries)

Currently, HadoopFileIO is used to talk to azure blob storage.

ismailsimsek avatar Aug 14 '23 11:08 ismailsimsek

@karlschriek Have you been able to get this up and running with Azure Blob Storage?

@ismailsimsek can you point me to some documentation to help me to get this working on Azure Blob?

ghost avatar Sep 26 '23 18:09 ghost

could you try this options https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage adding debezium.sink.iceberg. as prefix.

it will also require hadoop azure library if its not included currently https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-azure/3.3.6

ismailsimsek avatar Sep 27 '23 07:09 ismailsimsek

related to https://github.com/apache/iceberg/issues/8662

ismailsimsek avatar Sep 27 '23 19:09 ismailsimsek

Thanks, will give this a try if I have my setup in docker with sqlserver up and running.

ghost avatar Sep 28 '23 06:09 ghost

leaving example here: https://github.com/tabular-io/iceberg-kafka-connect?tab=readme-ov-file#azure-adls-configuration-example

ismailsimsek avatar May 21 '24 11:05 ismailsimsek