airbyte
airbyte copied to clipboard
🐛Destination Iceberg: Bump Iceberg to 1.5.2 and Spark to 3.5.1
What
This bumps the Iceberg version to the latest released (which is 1.5.2) one and aligns dependencies. It also updates Spark to 3.5.1.
Bumping the dependencies also fixes https://github.com/airbytehq/airbyte/issues/36441
How
Review guide
User Impact
Can this PR be safely reverted and rolled back?
- [x] YES 💚
- [ ] NO ❌
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
airbyte-docs | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Jun 11, 2024 8:15pm |
I've run all unit / integration tests locally and all of them passed. I also ran Airbyte locally and tested things where data was being streamed from Parquet files to the Iceberg REST catalog (as described in this blogpost).
@marcosmarxm since you were reviewing earlier PRs for this destination, could you take a look please? This should also fix the error reported in #36441
@nastra
Azure related jars not added in the iceberg connector, tested for rest catalog, its failing with below error, any suggestions on fixing this issue
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (destination-iceberg-write-52-4-updgj executor driver): java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) at org.apache.iceberg.hadoop.Util.getFs(Util.java:56) at org.apache.iceberg.hadoop.HadoopOutputFile.fromPath(HadoopOutputFile.java:53) at org.apache.iceberg.hadoop.HadoopFileIO.newOutputFile(HadoopFileIO.java:97) at org.apache.iceberg.io.ResolvingFileIO.newOutputFile(ResolvingFileIO.java:92) at
@VeeraswamyGatta I don't think Azure is currently supported by the Iceberg destination. Also it's not in the scope of this PR to add Azure support
@nastra thanks for the contribution. I need to check the current integration tests for destination iceberg, I think they're broken today.
@nastra
Thank you. If I want to include Azure support for this connector, do you have any suggestions on how to complete the Azure support?
@nastra thanks for the contribution. I need to check the current integration tests for destination iceberg, I think they're broken today.
@marcosmarxm they were all passing for me locally when opening the PR. Do you have some details on what exactly is broken?
@nastra
Thank you. If I want to include Azure support for this connector, do you have any suggestions on how to complete the Azure support?
@VeeraswamyGatta you would have to add a new storage config similar to how S3 is being handled
@nastra Thank you
Running CI Tests.
@evantahler can you approve-and-merge this contribution? It looks folks are able to build it locally and have the connector working. The current tests for Iceberg are timing out or are failing. I'd like to unblock users and have some time in the future to put this connector in a good state. Let me know what you think about it.
Can you confirm that airbyte-ci
build works for the connector with these changes? If so, yes, let's merge it!
Can you confirm that
airbyte-ci
build works for the connector with these changes? If so, yes, let's merge it!
Hey @evantahler, I'm that guy that said that tried it on Slack. Disclaimer: I'm not part of Airbyte team.
I checked out this PR branch and ran:
airbyte git:(bump-iceberg-to-1.5.2) ✗ ./gradlew :airbyte-integrations:connectors:destination-iceberg:buildConnectorImage
...
3: /Users/psole/Library/Caches/pypoetry/virtualenvs/pipelines-5vjYXjim-py3.10/bin/airbyte-ci connectors --name=destination-iceberg --disable-report-auto-open build --use-host-gradle-dist-tar
...
Build connector destination-iceberg - Build destination-iceberg docker image for platform(s) linux/arm64: ✅ was successful (duration: 13.76s)
Build connector destination-iceberg - Load airbyte/destination-iceberg:dev to the local docker host.: 🚀 Start Load airbyte/destination-iceberg:dev to the local docker host.
...
BUILD SUCCESSFUL in 58s
17 actionable tasks: 4 executed, 13 up-to-date
Then loaded the image on my kind k8s cluster and configured a destination connector using it. Finally tested doing a Faker -> Iceberg-dev run using Nessie as a REST catalog and MinIO for the bucket.
Worked like a charm.
Good enough for me!
/approve-and-merge reason="community connector with verified build and wonky tests"
Lets merge it