airbyte [destination-iceberg] throws error on sync slf4j.Log4jLoggerFactory: method 'void <init>()' not found

Connector Name

destination-iceberg

Connector Version

0.1.5

What step the error happened?

During the sync

Relevant information

It set up Airbyte and Iceberg-Spark on a completely new system unsing docker. I used the following scripts for installation: https://docs.airbyte.com/deploying-airbyte/local-deployment https://iceberg.apache.org/spark-quickstart/

Relevant log output

2024-03-25 09:52:50 destination > 2024-03-25 09:52:50 INFO i.a.c.i.b.IntegrationCliParser(parseOptions):126 - integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json}
2024-03-25 09:52:50 source > INFO main c.z.h.HikariDataSource(<init>):79 HikariPool-1 - Starting...
2024-03-25 09:52:50 destination > 2024-03-25 09:52:50 INFO i.a.c.i.b.IntegrationRunner(runInternal):132 - Running integration: io.airbyte.integrations.destination.iceberg.IcebergDestination
2024-03-25 09:52:50 destination > 2024-03-25 09:52:50 INFO i.a.c.i.b.IntegrationRunner(runInternal):133 - Command: WRITE
2024-03-25 09:52:50 destination > 2024-03-25 09:52:50 INFO i.a.c.i.b.IntegrationRunner(runInternal):134 - Integration config: IntegrationConfig{command=WRITE, configPath='destination_config.json', catalogPath='destination_catalog.json', statePath='null'}
2024-03-25 09:52:50 source > INFO main c.z.h.HikariDataSource(<init>):81 HikariPool-1 - Start completed.
2024-03-25 09:52:51 destination > 2024-03-25 09:52:51 WARN c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-03-25 09:52:51 destination > 2024-03-25 09:52:51 WARN c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-03-25 09:52:51 source > INFO main i.a.c.i.s.j.AbstractJdbcSource(logPreSyncDebugData):467 Data source product recognized as Microsoft SQL Server:13.00.4522
2024-03-25 09:52:51 source > INFO main i.a.i.s.m.MssqlQueryUtils(getIndexInfoForStreams):77 Discovering indexes for table "dbo"."Ledger"
2024-03-25 09:52:51 source > INFO main i.a.i.s.m.MssqlQueryUtils(getIndexInfoForStreams):88 Failed to get index for "dbo"."Ledger"
2024-03-25 09:52:51 source > INFO main i.a.c.i.s.j.AbstractJdbcSource(discoverInternal):169 Internal schemas to exclude: [spt_fallback_db, spt_monitor, cdc, spt_values, INFORMATION_SCHEMA, spt_fallback_usg, MSreplication_options, sys, spt_fallback_dev]
2024-03-25 09:52:51 destination > 2024-03-25 09:52:51 ERROR i.a.c.i.b.AirbyteExceptionHandler(uncaughtException):26 - Something went wrong in the connector. See the logs for more details.
2024-03-25 09:52:51 destination > java.lang.NoSuchMethodError: org.apache.logging.slf4j.Log4jLoggerFactory: method 'void <init>()' not found
2024-03-25 09:52:51 destination >         at org.slf4j.impl.StaticLoggerBinder.<init>(StaticLoggerBinder.java:53) ~[log4j-slf4j-impl-2.17.2.jar:2.17.2]
2024-03-25 09:52:51 destination >         at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:41) ~[log4j-slf4j-impl-2.17.2.jar:2.17.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging$.org$apache$spark$internal$Logging$$isLog4j2(Logging.scala:232) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging.initializeLogging(Logging.scala:129) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:115) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:109) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.initializeLogIfNecessary(Utils.scala:91) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:106) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:105) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.initializeLogIfNecessary(Utils.scala:91) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging.log(Logging.scala:53) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging.log$(Logging.scala:51) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.log(Utils.scala:91) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging.logWarning(Logging.scala:73) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.internal.Logging.logWarning$(Logging.scala:72) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.logWarning(Utils.scala:91) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.$anonfun$findLocalInetAddress$1(Utils.scala:1067) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.$anonfun$findLocalInetAddress$1$adapted(Utils.scala:1057) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at scala.collection.immutable.List.foreach(List.scala:333) ~[scala-library-2.13.10.jar:?]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:1057) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:1040) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:1040) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:1097) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination >         at scala.Option.getOrElse(Option.scala:201) ~[scala-library-2.13.10.jar:?]

Contribute

[x] Yes, I want to contribute

Mar 25 '24 10:03 markus-renezeder

Thanks for reporting the issue @markus-renezeder Iceberg Destination is a community connector and it isn't in the current roadmap for improvements. If you want to contribute fixing the issue please reach me out in Slack so I can provide you instructions to make the contribution 🎖️

Mar 25 '24 16:03 marcosmarxm

I'm seeing this too. Was something refactoring to do with logging that broke this?

Apr 05 '24 09:04 lkm

Bumping spark and iceberg packages version in build.gradle solved this issue for me:

## build.gradle
..
implementation ('org.apache.spark:spark-sql_2.13:3.5.0') {
      exclude(group: 'org.apache.hadoop', module: 'hadoop-common')
}
implementation ('org.apache.spark:spark-hive_2.13:3.5.0') {
      exclude(group: 'org.apache.hadoop', module: 'hadoop-common')
}
implementation 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.13:1.5.0'
..

Then another issue arose with antlr dependency since spark 3.x requires 4.9.3 and airbyte-cdk builds with 4.10.1 so I forced it (I guess? not a java dev btw) to use proper version:

## build.gradle
..
    implementation ('org.antlr:antlr4-runtime') {
        version {
            strictly('4.9.3')
        }
    }
..

and this, hacked together with #32720, worked for me, but I have only tried the nessie catalog version. Maybe someone more well versed in Java development could take a look at it and confirm it (and test it with other catalogues).

Apr 05 '24 21:04 dp997

In wich file did you place the second part (antlr 4.9.3)? In the build.gradle of the iceberg extension?

Apr 08 '24 12:04 markus-renezeder

In wich file did you place the second part (antlr 4.9.3)? In the build.gradle of the iceberg extension?

exactly there the whole dependency block looks like this for me (please note that I am also using nessie package in there):

## build.gradle

dependencies {

    implementation ('org.apache.spark:spark-sql_2.13:3.5.0') {
        exclude group: 'org.apache.hadoop', module: 'hadoop-common'
    }
    implementation ('org.apache.spark:spark-hive_2.13:3.5.0') {
        exclude group: 'org.apache.hadoop', module: 'hadoop-common'
    }
    // Nessie Version needs to be in sync with the Nessie version in Iceberg.
    implementation 'org.projectnessie.nessie-integrations:nessie-spark-extensions-3.5_2.13:0.79.0'
    implementation 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.13:1.5.0'

    // force awssdk version required by Iceberg
    implementation "software.amazon.awssdk:utils:2.20.131"
    implementation "software.amazon.awssdk:url-connection-client:2.20.131"
    implementation "software.amazon.awssdk:s3:2.20.131"
    implementation "software.amazon.awssdk:glue:2.20.131"
    implementation "software.amazon.awssdk:dynamodb:2.20.131"
    implementation "software.amazon.awssdk:kms:2.20.131"
    implementation "software.amazon.awssdk:sts:2.20.131"
    implementation "software.amazon.awssdk:sdk-core:2.20.131"
    implementation "software.amazon.awssdk:aws-core:2.20.131"

    implementation 'org.apache.hadoop:hadoop-aws:3.3.2'
    implementation 'org.apache.hadoop:hadoop-client-api:3.3.2'
    implementation 'org.apache.hadoop:hadoop-client-runtime:3.3.2'
    implementation "org.postgresql:postgresql:42.5.0"
    implementation "commons-collections:commons-collections:3.2.2"

    implementation ('org.antlr:antlr4-runtime') {
        version {
            strictly('4.9.3')
        }
    }

    testImplementation libs.testcontainers.postgresql
    integrationTestJavaImplementation libs.testcontainers.postgresql

    testImplementation 'org.mockito:mockito-inline:4.7.0'
}

Apr 08 '24 12:04 dp997

Sorry for my questions, I don't know anything about gradle. Must I build airbyte myselfe or is it done by exexcuting the run-ab-platform.sh (respectively is the connector build during the installation?).

Apr 08 '24 12:04 markus-renezeder

I didn't know anything about gradle until Friday as well, so no worries :) After changes I ran gradlew from root directory as referenced here: https://docs.airbyte.com/connector-development/testing-connectors/

./gradlew :airbyte-integrations:connectors:destination-iceberg:buildConnectorImage

That built the connector image with "dev" tag, and I just pointed airbyte ui to pull that specific tag in "Settings" -> "Connectors":

The fixes I applied should work at least with nessie, but as I said I would prefer if someone with more experience took a look at it because I am not sure if bumping package versions won't break the integration in other places.

Apr 08 '24 13:04 dp997

I was able to solve the problem above using your tipps. Thank you very much. But now I run into a new problem:

java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.expressions.AnsiCast

At the moment, I'm not sure what's the problem. I'll have to do some further investigation. Maybe a reference is missing?

But after you're able to use it with Nessie, I'm not sure if it's a problem by Airbyte or it's thrown by the target itselfe (I'm using this image https://hub.docker.com/r/tabulario/spark-iceberg to run iceberg and spark in docker).

Apr 09 '24 06:04 markus-renezeder

If I would be able to type out the correct version - it works! Thank you!

Apr 10 '24 11:04 markus-renezeder

Thanks for reporting this issue. I took a look and upgraded Iceberg/Spark and some other dependencies as was suggested here. https://github.com/airbytehq/airbyte/pull/38283 should fix this

May 16 '24 11:05 nastra

Thank you for working on this - the iceberg destination is pretty stuck at the moment.

Jun 12 '24 13:06 sra

airbyte airbyte copied to clipboard

[destination-iceberg] throws error on sync slf4j.Log4jLoggerFactory: method 'void <init>()' not found

Connector Name

Connector Version

What step the error happened?

Relevant information

Relevant log output

Contribute

airbyte
airbyte copied to clipboard