iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

AWS: Fix kryo serialization failure for S3 FileIO

Open singhpk234 opened this issue 3 years ago • 1 comments

About the change

presently spark queries fails when using S3fileIO & GlueCatalog when being used with KryoSerializer ref https://github.com/apache/iceberg/issues/5414#issuecomment-1204319969. This happens because Immutable map part of S3FileIO properties is not serializable with Kryo and seralizers for it not available in Twitter Chill lib as well (which also spark uses). This PR attemps to fix this by using java collections instead which spark will be able to ser/de as it's serializer is present in Chill lib.

Solves https://github.com/apache/iceberg/issues/5414

Testing Done

Manual Test

UT to validate the change fix the issue, without the fix the UT fails with exception mentioned in ticket https://github.com/apache/iceberg/issues/5414#issuecomment-1204100668

singhpk234 avatar Aug 04 '22 12:08 singhpk234

This looks like the right fix to me. Thanks, @singhpk234! Could you also add tests for the other FileIO implementations?

rdblue avatar Aug 07 '22 20:08 rdblue

Thanks, @singhpk234!

rdblue avatar Sep 01 '22 22:09 rdblue

Thanks @rdblue @kbendick @nastra @amogh-jahagirdar for your awesome reviews :) !!!

singhpk234 avatar Sep 02 '22 05:09 singhpk234

@singhpk234 I am still facing this issue using iceberg-spark-runtime-3.3_2.12:1.2.0 and software.amazon.awssdk:bundle:2.20.18

akshay-kokate-06 avatar Apr 13 '23 09:04 akshay-kokate-06

@akshay-kokate-06 what issue are you seeing exactly? Could you provide the stack trace please?

nastra avatar Apr 14 '23 08:04 nastra