datafusion
datafusion copied to clipboard
refactor(joins::utils): Replace OnceAsync/OnceFut with tokio's OnceCell
Rationale for this change
I wrote this PR mostly to show that it can be done. Replacing OnceFut with a OnceCell has some advantages though:
- It no longer stores a
Futurein the ExecutionPlan, but just the synchronization primitive. In theory this can allow to make the ExecutionPlan implementClonetoo. - One fewer custom synchronization primitive to maintain
What changes are included in this PR?
- Introduce
SharedResultOnceCell- A wrapper around tokio'sOnceCell<SharedResult<Arc<T>>>. Arc's are used all over the place to avoid lifetime issues - Replace usages of OnceAsync(OnceFut) with SharedResultOnceCell (SharedResultPending)
- Adjust variable names and doc comments in all users of OnceFut
- Remove OnceAsync + OnceFut
Are these changes tested?
This PR should not change the behaviour of the join implementations.
Are there any user-facing changes?
No.
Thanks @ctsk -- this idea sounds good -- I think we should run the benchmarks to make sure they don't cause issues. I'll do so shortly
I'm moving this back to the draft stage, because it will conflict with https://github.com/apache/datafusion/pull/15476 once I fix that PR.
I can't see a way to achieve the goals of this PR and https://github.com/apache/datafusion/pull/15476 at the same time with a single OnceCell. I suspect that if I pursue #15476, I can not avoid storing a SendableRecordBatchStream or some other Future in the ExecutionPlan.
So overall, it seems to me like this PR is less clear of a win.
@ctsk thank you so much for this PR (and others) can you share more about what you are doing with DataFusion? If you would like to have a brief introduction video call, feel free to email me at [email protected] and we can setup something
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.