iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Improve MetricsReporter loading with class loader fallback

Open bk-mz opened this issue 1 month ago • 1 comments

This pull request addresses the issue of class loader discrepancies when loading JAR files via the spark-submit command from remote locations (such as S3 or through Ivy). This discrepancy specifically impacts deployments on EMR clusters with the Iceberg feature enabled.

Result of this slack thread discussion.

Description:

When executing Spark jobs using the spark-submit command, JAR files loaded from remote locations (e.g., S3 or via Ivy) are placed into a different class loader known as org.apache.spark.util.MutableURLClassLoader. This class loader is a child class loader of the AppClassLoader.

When enabling the EMR Iceberg flag, as described in the AWS EMR documentation, the Iceberg JAR file resides in the AppClassLoader. In contrast, user code (such as a metric reporter) is placed in the MutableURLClassLoader. Consequently, the Iceberg code can't access classes from the user code because the parent class loader (AppClassLoader) doesn't have visibility into the child class loader (MutableURLClassLoader)

bk-mz avatar Jun 07 '24 07:06 bk-mz