airbyte
airbyte copied to clipboard
[destination-iceberg] throws error on sync slf4j.Log4jLoggerFactory: method 'void <init>()' not found
Connector Name
destination-iceberg
Connector Version
0.1.5
What step the error happened?
During the sync
Relevant information
It set up Airbyte and Iceberg-Spark on a completely new system unsing docker. I used the following scripts for installation: https://docs.airbyte.com/deploying-airbyte/local-deployment https://iceberg.apache.org/spark-quickstart/
Relevant log output
2024-03-25 09:52:50 destination > 2024-03-25 09:52:50 INFO i.a.c.i.b.IntegrationCliParser(parseOptions):126 - integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json}
2024-03-25 09:52:50 source > INFO main c.z.h.HikariDataSource(<init>):79 HikariPool-1 - Starting...
2024-03-25 09:52:50 destination > 2024-03-25 09:52:50 INFO i.a.c.i.b.IntegrationRunner(runInternal):132 - Running integration: io.airbyte.integrations.destination.iceberg.IcebergDestination
2024-03-25 09:52:50 destination > 2024-03-25 09:52:50 INFO i.a.c.i.b.IntegrationRunner(runInternal):133 - Command: WRITE
2024-03-25 09:52:50 destination > 2024-03-25 09:52:50 INFO i.a.c.i.b.IntegrationRunner(runInternal):134 - Integration config: IntegrationConfig{command=WRITE, configPath='destination_config.json', catalogPath='destination_catalog.json', statePath='null'}
2024-03-25 09:52:50 source > INFO main c.z.h.HikariDataSource(<init>):81 HikariPool-1 - Start completed.
2024-03-25 09:52:51 destination > 2024-03-25 09:52:51 WARN c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-03-25 09:52:51 destination > 2024-03-25 09:52:51 WARN c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-03-25 09:52:51 source > INFO main i.a.c.i.s.j.AbstractJdbcSource(logPreSyncDebugData):467 Data source product recognized as Microsoft SQL Server:13.00.4522
2024-03-25 09:52:51 source > INFO main i.a.i.s.m.MssqlQueryUtils(getIndexInfoForStreams):77 Discovering indexes for table "dbo"."Ledger"
2024-03-25 09:52:51 source > INFO main i.a.i.s.m.MssqlQueryUtils(getIndexInfoForStreams):88 Failed to get index for "dbo"."Ledger"
2024-03-25 09:52:51 source > INFO main i.a.c.i.s.j.AbstractJdbcSource(discoverInternal):169 Internal schemas to exclude: [spt_fallback_db, spt_monitor, cdc, spt_values, INFORMATION_SCHEMA, spt_fallback_usg, MSreplication_options, sys, spt_fallback_dev]
2024-03-25 09:52:51 destination > 2024-03-25 09:52:51 ERROR i.a.c.i.b.AirbyteExceptionHandler(uncaughtException):26 - Something went wrong in the connector. See the logs for more details.
2024-03-25 09:52:51 destination > java.lang.NoSuchMethodError: org.apache.logging.slf4j.Log4jLoggerFactory: method 'void <init>()' not found
2024-03-25 09:52:51 destination > at org.slf4j.impl.StaticLoggerBinder.<init>(StaticLoggerBinder.java:53) ~[log4j-slf4j-impl-2.17.2.jar:2.17.2]
2024-03-25 09:52:51 destination > at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:41) ~[log4j-slf4j-impl-2.17.2.jar:2.17.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging$.org$apache$spark$internal$Logging$$isLog4j2(Logging.scala:232) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging.initializeLogging(Logging.scala:129) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:115) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:109) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.initializeLogIfNecessary(Utils.scala:91) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:106) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:105) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.initializeLogIfNecessary(Utils.scala:91) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging.log(Logging.scala:53) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging.log$(Logging.scala:51) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.log(Utils.scala:91) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging.logWarning(Logging.scala:73) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.internal.Logging.logWarning$(Logging.scala:72) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.logWarning(Utils.scala:91) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.$anonfun$findLocalInetAddress$1(Utils.scala:1067) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.$anonfun$findLocalInetAddress$1$adapted(Utils.scala:1057) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at scala.collection.immutable.List.foreach(List.scala:333) ~[scala-library-2.13.10.jar:?]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:1057) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:1040) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:1040) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:1097) ~[spark-core_2.13-3.3.2.jar:3.3.2]
2024-03-25 09:52:51 destination > at scala.Option.getOrElse(Option.scala:201) ~[scala-library-2.13.10.jar:?]
Contribute
- [x] Yes, I want to contribute
Thanks for reporting the issue @markus-renezeder Iceberg Destination is a community connector and it isn't in the current roadmap for improvements. If you want to contribute fixing the issue please reach me out in Slack so I can provide you instructions to make the contribution 🎖️
I'm seeing this too. Was something refactoring to do with logging that broke this?
Bumping spark and iceberg packages version in build.gradle solved this issue for me:
## build.gradle
..
implementation ('org.apache.spark:spark-sql_2.13:3.5.0') {
exclude(group: 'org.apache.hadoop', module: 'hadoop-common')
}
implementation ('org.apache.spark:spark-hive_2.13:3.5.0') {
exclude(group: 'org.apache.hadoop', module: 'hadoop-common')
}
implementation 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.13:1.5.0'
..
Then another issue arose with antlr dependency since spark 3.x requires 4.9.3 and airbyte-cdk builds with 4.10.1 so I forced it (I guess? not a java dev btw) to use proper version:
## build.gradle
..
implementation ('org.antlr:antlr4-runtime') {
version {
strictly('4.9.3')
}
}
..
and this, hacked together with #32720, worked for me, but I have only tried the nessie catalog version. Maybe someone more well versed in Java development could take a look at it and confirm it (and test it with other catalogues).
In wich file did you place the second part (antlr 4.9.3)? In the build.gradle of the iceberg extension?
In wich file did you place the second part (antlr 4.9.3)? In the build.gradle of the iceberg extension?
exactly there the whole dependency block looks like this for me (please note that I am also using nessie package in there):
## build.gradle
dependencies {
implementation ('org.apache.spark:spark-sql_2.13:3.5.0') {
exclude group: 'org.apache.hadoop', module: 'hadoop-common'
}
implementation ('org.apache.spark:spark-hive_2.13:3.5.0') {
exclude group: 'org.apache.hadoop', module: 'hadoop-common'
}
// Nessie Version needs to be in sync with the Nessie version in Iceberg.
implementation 'org.projectnessie.nessie-integrations:nessie-spark-extensions-3.5_2.13:0.79.0'
implementation 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.13:1.5.0'
// force awssdk version required by Iceberg
implementation "software.amazon.awssdk:utils:2.20.131"
implementation "software.amazon.awssdk:url-connection-client:2.20.131"
implementation "software.amazon.awssdk:s3:2.20.131"
implementation "software.amazon.awssdk:glue:2.20.131"
implementation "software.amazon.awssdk:dynamodb:2.20.131"
implementation "software.amazon.awssdk:kms:2.20.131"
implementation "software.amazon.awssdk:sts:2.20.131"
implementation "software.amazon.awssdk:sdk-core:2.20.131"
implementation "software.amazon.awssdk:aws-core:2.20.131"
implementation 'org.apache.hadoop:hadoop-aws:3.3.2'
implementation 'org.apache.hadoop:hadoop-client-api:3.3.2'
implementation 'org.apache.hadoop:hadoop-client-runtime:3.3.2'
implementation "org.postgresql:postgresql:42.5.0"
implementation "commons-collections:commons-collections:3.2.2"
implementation ('org.antlr:antlr4-runtime') {
version {
strictly('4.9.3')
}
}
testImplementation libs.testcontainers.postgresql
integrationTestJavaImplementation libs.testcontainers.postgresql
testImplementation 'org.mockito:mockito-inline:4.7.0'
}
Sorry for my questions, I don't know anything about gradle. Must I build airbyte myselfe or is it done by exexcuting the run-ab-platform.sh (respectively is the connector build during the installation?).
I didn't know anything about gradle until Friday as well, so no worries :)
After changes I ran gradlew
from root directory as referenced here:
https://docs.airbyte.com/connector-development/testing-connectors/
./gradlew :airbyte-integrations:connectors:destination-iceberg:buildConnectorImage
That built the connector image with "dev" tag, and I just pointed airbyte ui to pull that specific tag in "Settings" -> "Connectors":
The fixes I applied should work at least with nessie, but as I said I would prefer if someone with more experience took a look at it because I am not sure if bumping package versions won't break the integration in other places.
I was able to solve the problem above using your tipps. Thank you very much. But now I run into a new problem:
java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.expressions.AnsiCast
At the moment, I'm not sure what's the problem. I'll have to do some further investigation. Maybe a reference is missing?
But after you're able to use it with Nessie, I'm not sure if it's a problem by Airbyte or it's thrown by the target itselfe (I'm using this image https://hub.docker.com/r/tabulario/spark-iceberg to run iceberg and spark in docker).
If I would be able to type out the correct version - it works! Thank you!
Thanks for reporting this issue. I took a look and upgraded Iceberg/Spark and some other dependencies as was suggested here. https://github.com/airbytehq/airbyte/pull/38283 should fix this
Thank you for working on this - the iceberg destination is pretty stuck at the moment.