datafusion-ballista job data cleanup does not work if `pull-staged` strategy selected

Describe the bug

If the pull-staged strategy is selected job data won't be cleared after the job is finished, leaving data hanging on the executes. This can lead to piling old shuffle files on the executor. One way to prevent is to set executors to cleanup data more aggressively.

To Reproduce

just run default ballista cluster setup

Expected behavior

Shuffle files to be removed when job finishes or when there is no need for them

Additional context

push-based strategy works as expected
It might be related to #1175

Apr 01 '25 08:04 milenkovicm

with pull-staged strategy, executor does not expose grpc service, thus scheduler can not connect to executor to perform data removal.

We need to find good approach to handle this apart from executor ttl

May 21 '25 20:05 milenkovicm

Hi @milenkovicm , I would like to take this issue. In PullStaged, the scheduler can’t call executors directly to clean job data.

Proposal: extend PollWorkResult with CleanJobDataParams so that executors receive job IDs to clean in the next poll_work response.

message PollWorkResult {
  repeated TaskDefinition tasks   = 1;
  repeated CleanJobDataParams cleanups = 2;  // new field
}

If this sounds good, I’ll take the issue.

Sep 02 '25 07:09 KR-bluejay

Hi @KR-bluejay, It does make sense

Sep 02 '25 09:09 milenkovicm

Thanks! I'll take it.

Sep 02 '25 09:09 KR-bluejay

Hi, @milenkovicm I have implemented the pull-based cleanup, but I’m not sure about two things: (Currently, the PR is still in draft mode. #1314)

Tests
The scheduler keeps the cleanup job list, and the executors fetch it via poll_work.
I’m not sure how to properly test this flow.
Do you have any recommendations or existing test patterns I should follow?
User-facing changes
As far as I can see, this only adds values during poll_work from scheduler to executor.
I don’t think there are any user-facing changes.
Could you confirm if that’s correct?

Thanks in advance for your advice!

Sep 08 '25 06:09 KR-bluejay

thanks for the pr @KR-bluejay

i'm not sure, will have a look
i dont think this is user facing change, does not matter much

will have a look at the pr in next few days, we can discuss then

Sep 08 '25 08:09 milenkovicm

Got it, thank you for the update! I'll wait for your feedback.

Sep 08 '25 08:09 KR-bluejay

I believe this issue is related to #602

Sep 11 '25 09:09 milenkovicm