dynamic filter refactor
Which issue does this PR close?
- Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
https://github.com/apache/datafusion/pull/15568#discussion_r2038773841
why the change is equivalent to yours in the high level idea.
- DynamicFilterPhysicalExpr gets initialized at planning time with a known set of children but a placeholder expression (lit(true))
The same.
- with_new_children is called making a new DynamicFilterPhysicalExpr but with the children replaced (let's ignore how that happens internally for now)
We need to replace children, and we can achieve this and get the result by snapshot
- update is called on the original reference with an expression that references the original children. This is propagated to all references, including those with new children, because of the Arc<RwLock<...>>.
We can keep Arc<RwLock<T>> and update the inner or even create another new source filter.
- evaluate is called on one of the references that previously had with_new_children called on it. Since update was called, which swapped out inner, the children of this new inner need to be remapped to the children that we currently expose externally.
We can call evaluate on snapshot because snapshot is already remapped. Your version need to remap for each evaluate called, but snapshot in my version done it once, and we evaluate on it without remap.
The improvement of this chanage
- We have correct
with_new_childrenbecause we update the source filter now. DynamicFilterPhysicalExpris basicallyfilter expression: Arc<dyn PhysicalExpr>>. We have a simple interface with the same capability now.
Concern
reassign_predicate_columnsis replaced bysnapshotbut you think we can't do this kind of change because of API churn. I think this is not an issue because onlyDatafusionArrowPredicateis used.- Do we need Lock for source filter? I think we can create another new
DynamicFilterPhysicalExprat all. But maybe there is some reasons we can't, we can discuss further on this.
Hey @jayzhan211 thank you for putting the work into trying to clarify this.
At this point I think it would be best to wait for #15566 or a PR that replaces it to be merged so that we can work against an actual use case / implementation of these dynamic filters. Otherwise I think it's a bit hard to communicate in such abstract terms. Once we're looking at a concrete use case it will be easier to make a PR to replace this implementation.
Would it be okay with you to wait until that happens to continue this discussion?
Sorry if merging a PR with a bad implementation becomes problematic... luckily it's problematic for us, not end users, since this is all private implementations.
Blockers are gone. I think we can focus on this now
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.