hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28954: CI fails intermittently due to ephemeral-storage exhaustion

Open zabetak opened this issue 10 months ago • 2 comments

What changes were proposed in this pull request?

Introduce ephemeral storage request and limit for HDB container based on current usage and cluster capacity.

Why are the changes needed?

Based on recent runs the HDB container, which executes the tests, consumes 10Gi to 15Gi of ephemeral storage. To ensure that pods are scheduled correctly to the GKE nodes that have the necessary capacity we should add an explicit resource request.

Moreover, to avoid malfunctioning PRs/pods affect the overall health of the cluster we set the resource limit to 20Gi that is reasonably high to permit precommits to run fine and can also guard against accidental changes that may cause disk spikes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Monitor requests/limits through the GKE console.

zabetak avatar Jun 03 '25 11:06 zabetak

LGTM +1 There is one test failure which is flaky , re-triggering should pass the tests.

kokila-19 avatar Jun 11 '25 11:06 kokila-19

Thanks everyone for the reviews! Much appreciated :)

zabetak avatar Jun 24 '25 14:06 zabetak