airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

🐛Destination Iceberg: Bump Iceberg to 1.5.2 and Spark to 3.5.1

Open nastra opened this issue 9 months ago • 3 comments

What

This bumps the Iceberg version to the latest released (which is 1.5.2) one and aligns dependencies. It also updates Spark to 3.5.1.

Bumping the dependencies also fixes https://github.com/airbytehq/airbyte/issues/36441

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

  • [x] YES 💚
  • [ ] NO ❌

nastra avatar May 16 '24 09:05 nastra

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 11, 2024 8:15pm

vercel[bot] avatar May 16 '24 09:05 vercel[bot]

I've run all unit / integration tests locally and all of them passed. I also ran Airbyte locally and tested things where data was being streamed from Parquet files to the Iceberg REST catalog (as described in this blogpost).

@marcosmarxm since you were reviewing earlier PRs for this destination, could you take a look please? This should also fix the error reported in #36441

nastra avatar May 16 '24 11:05 nastra

@nastra

Azure related jars not added in the iceberg connector, tested for rest catalog, its failing with below error, any suggestions on fixing this issue

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (destination-iceberg-write-52-4-updgj executor driver): java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) at org.apache.iceberg.hadoop.Util.getFs(Util.java:56) at org.apache.iceberg.hadoop.HadoopOutputFile.fromPath(HadoopOutputFile.java:53) at org.apache.iceberg.hadoop.HadoopFileIO.newOutputFile(HadoopFileIO.java:97) at org.apache.iceberg.io.ResolvingFileIO.newOutputFile(ResolvingFileIO.java:92) at

VeeraswamyGatta avatar May 24 '24 02:05 VeeraswamyGatta

@VeeraswamyGatta I don't think Azure is currently supported by the Iceberg destination. Also it's not in the scope of this PR to add Azure support

nastra avatar May 27 '24 07:05 nastra

@nastra thanks for the contribution. I need to check the current integration tests for destination iceberg, I think they're broken today.

marcosmarxm avatar May 27 '24 12:05 marcosmarxm

@nastra

Thank you. If I want to include Azure support for this connector, do you have any suggestions on how to complete the Azure support?

VeeraswamyGatta avatar May 30 '24 04:05 VeeraswamyGatta

@nastra thanks for the contribution. I need to check the current integration tests for destination iceberg, I think they're broken today.

@marcosmarxm they were all passing for me locally when opening the PR. Do you have some details on what exactly is broken?

nastra avatar Jun 01 '24 09:06 nastra

@nastra

Thank you. If I want to include Azure support for this connector, do you have any suggestions on how to complete the Azure support?

@VeeraswamyGatta you would have to add a new storage config similar to how S3 is being handled

nastra avatar Jun 01 '24 09:06 nastra

@nastra Thank you

VeeraswamyGatta avatar Jun 03 '24 03:06 VeeraswamyGatta

Running CI Tests.

marcosmarxm avatar Jun 07 '24 16:06 marcosmarxm

@evantahler can you approve-and-merge this contribution? It looks folks are able to build it locally and have the connector working. The current tests for Iceberg are timing out or are failing. I'd like to unblock users and have some time in the future to put this connector in a good state. Let me know what you think about it.

marcosmarxm avatar Jun 14 '24 14:06 marcosmarxm

Can you confirm that airbyte-ci build works for the connector with these changes? If so, yes, let's merge it!

evantahler avatar Jun 14 '24 20:06 evantahler

Can you confirm that airbyte-ci build works for the connector with these changes? If so, yes, let's merge it!

Hey @evantahler, I'm that guy that said that tried it on Slack. Disclaimer: I'm not part of Airbyte team.

I checked out this PR branch and ran:

airbyte git:(bump-iceberg-to-1.5.2) ✗ ./gradlew :airbyte-integrations:connectors:destination-iceberg:buildConnectorImage
...
3: /Users/psole/Library/Caches/pypoetry/virtualenvs/pipelines-5vjYXjim-py3.10/bin/airbyte-ci connectors --name=destination-iceberg --disable-report-auto-open build --use-host-gradle-dist-tar
...
Build connector destination-iceberg - Build destination-iceberg docker image for platform(s) linux/arm64: ✅ was successful (duration: 13.76s)
Build connector destination-iceberg - Load airbyte/destination-iceberg:dev to the local docker host.: 🚀 Start Load airbyte/destination-iceberg:dev to the local docker host.
...
BUILD SUCCESSFUL in 58s
17 actionable tasks: 4 executed, 13 up-to-date

Then loaded the image on my kind k8s cluster and configured a destination connector using it. Finally tested doing a Faker -> Iceberg-dev run using Nessie as a REST catalog and MinIO for the bucket.

Worked like a charm.

zspsole avatar Jun 15 '24 00:06 zspsole

Good enough for me!

evantahler avatar Jun 15 '24 01:06 evantahler

/approve-and-merge reason="community connector with verified build and wonky tests"

evantahler avatar Jun 15 '24 01:06 evantahler

Lets merge it
sheep thumbs up

octavia-approvington avatar Jun 15 '24 01:06 octavia-approvington