iceberg
iceberg copied to clipboard
Build: Free disk space before running action in Spark CI
I've seen Spark CI failure due to no disk space
org.apache.iceberg.spark.extensions.TestCopyOnWriteMerge > testMergeWithConcurrentTableRefresh[catalogName = testhive, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hive, default-namespace=default}, format = parquet, vectorized = true, distributionMode = none, branch = test] FAILED
java.lang.AssertionError:
Expecting actual throwable to be an instance of:
java.lang.IllegalStateException
but was:
org.apache.spark.SparkException: Writing job aborted
at org.apache.spark.sql.errors.QueryExecutionErrors$.writingJobAbortedError(QueryExecutionErrors.scala:767)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:409)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:353)
...(41 remaining lines not displayed - this can be changed with Assertions.setMaxStackTraceElementsDisplayed)
at org.apache.iceberg.spark.extensions.TestCopyOnWriteMerge.testMergeWithConcurrentTableRefresh(TestCopyOnWriteMerge.java:148)
org.apache.iceberg.spark.extensions.TestCopyOnWriteMerge > testMergeWithMultipleUpdatesForTargetRowSmallTargetLargeSource[catalogName = testhive, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hive, default-namespace=default}, format = parquet, vectorized = true, distributionMode = none, branch = test] FAILED
Error: a.lang.AssertionError: [Should 2024-02-18T05:06:00.9975674Z ##[error]No space left on device : '/home/runner/runners/2.313.0/_diag/pages/943a8a72-7ff9-49d1-b4cb-09d7db8a44a1_80d440e4-f54b-5560-0192-53fee83660bc_1.log'
This PR attempts to free unneeded disk space with free-disk-space action.
With this action, it saved 27GiB in one Spark CI build.
Run jlumbroso/[email protected]
Run # ======
================================================================================
BEFORE CLEAN-UP:
$ dh -h /
Filesystem Size Used Avail Use% Mounted on
/dev/root 73G 56G 17G 77% /
...
================================================================================
AFTER CLEAN-UP:
$ dh -h /
Filesystem Size Used Avail Use% Mounted on
/dev/root 73G 32G 41G 45% /
...
overall:
********************************************************************************
=> Saved 27GiB
@Fokko and @singhpk234 please take a look at your convenience.
@manuzhang: Very nice to see this addition. Have we benchmarked how long it took to clean it up and overall increase in CI time with this?
@ajantha-bhat It took around two minutes per action run. I suppose actions are run in parallel so that's also the overall increase time?
Seems to be still an issue in the latest runs: https://github.com/apache/iceberg/actions/runs/8200693411/job/22427953001
Is it due to no disk space? The log is no longer available
Yep, it was due to disk space. So maybe something in Iceberg Spark 3.3 has a memory leak and that's how it surfaces