Deeply recursive UNION queries cause stack overflow
Describe the bug
In InfluxDB we sometimes create deeply nested queries like this that cause a stack overflow, even in release builds:
To Reproduce
Download: blowout.zip
And run
datafusion-cli -f blowout.sql
This results in
andrewlamb@Andrews-MacBook-Pro Downloads % datafusion-cli -f blowout.sql
datafusion-cli -f blowout.sql
DataFusion CLI v36.0.0
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
The query looks like this
SELECT
(SELECT 1)
UNION ALL (SELECT 1)
UNION ALL (SELECT 1)
UNION ALL (SELECT 1)
.... (works with 2500 clauses, overflow with 3000)
UNION ALL (SELECT 1)
Expected behavior
a runtime error rather than stack overflow. Bonus points if the query actually completed
Additional context
It may be possible to change the treenode recursion to use an iterative approach rather than a recursive one (aka keep a worklist), much as we did in the sql parser to protect against deeply nested expressions
Note it is important to use a release build (as the debug build fails earlier in the process)
In a release build, the stack looks like this:
<alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter 0x000000010372c65c
core::iter::adapters::try_process 0x00000001037aaffc
datafusion_common::tree_node::TreeNode::transform_up 0x0000000103818c4c
<alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter 0x000000010372c684
core::iter::adapters::try_process 0x00000001037aaffc
datafusion_common::tree_node::TreeNode::transform_up 0x0000000103818c4c
<alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter 0x000000010372c684
.... MANY REPEATS ....
datafusion_common::tree_node::TreeNode::transform_up 0x0000000103818c4c
datafusion_optimizer::analyzer::Analyzer::execute_and_check 0x00000001035ed1b0
datafusion::execution::context::SessionState::optimize 0x00000001035e487c
datafusion::dataframe::DataFrame::create_physical_plan::{{closure}} 0x0000000102e35e5c
datafusion_cli::exec::exec_and_print::{{closure}} 0x0000000102e3ffa8
datafusion_cli::exec::exec_from_lines::{{closure}} 0x0000000102e41efc
datafusion_cli::exec::exec_from_files::{{closure}} 0x0000000102e41a38
datafusion_cli::main_inner::{{closure}} 0x0000000102e5d1b4
tokio::runtime::park::CachedParkThread::block_on 0x0000000102e56f64
tokio::runtime::runtime::Runtime::block_on 0x0000000102f72e7c
datafusion_cli::main 0x0000000102f004d0
std::sys_common::backtrace::__rust_begin_short_backtrace 0x0000000102f2c4f0
std::rt::lang_start::{{closure}} 0x0000000102f5b3f0
[Inlined] core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once function.rs:284
[Inlined] std::panicking::try::do_call panicking.rs:552
[Inlined] std::panicking::try panicking.rs:516
[Inlined] std::panic::catch_unwind panic.rs:142
[Inlined] std::rt::lang_start_internal::{{closure}} rt.rs:148
[Inlined] std::panicking::try::do_call panicking.rs:552
[Inlined] std::panicking::try panicking.rs:516
[Inlined] std::panic::catch_unwind panic.rs:142
std::rt::lang_start_internal rt.rs:148
main 0x0000000102f005d8
start 0x00000001825fd0e0
take
I believe there are two ways to fully resolve this issue:
-
Switch from the recursion to iteration like https://github.com/apache/datafusion/pull/6360 and https://github.com/apache/datafusion/pull/10023. To do this, we would need to process nodes via VecDeque or Vec. We also need to reimplement support for
TreeNodeRecursionso users can still control the processing flow. -
Keep the recursion but use something like https://github.com/DelSkayn/reblessive. The downside is adding another dependency, but on the plus side, it will make fixing stack overflow bugs easier in the future. That way, the current
TreeNodeRecursionimplementation would still work.
I will go with the first option, because it was suggested by @alamb, but open to discussing this further if needed 🙂
I agree -- stuff like `rebelssive' (and stacker is another) seem to me like a workaround for a more fundamental algorithmic design issue. Thank you for taking this on @blaginin
Just wanted to share my thoughts on the task before I bombard you with PRs 😅
I think ConcreteTreeNode and DynTreeNode are the best places to write iterative algorithms. They already have methods to get and replace children, so we can flatten and update them iteratively. Re-implementing non-recursive visit/rewrite/transform methods for ConcreteTreeNode and DynTreeNode feels like a good first step to me.
Once this is done, we will need to implement either of those two traits for LogicalPlan:
Because LogicalPlan stores its children in Arcs, it feels natural to implement support for DynTreeNode. But I think this approach will be slow. Right now, when we do map_children, we destruct self before invoking the given function on the Arc::unwrap_or_clone(Arc<LogicalPlan>). This means that most likely, we just unwrap the child logical plan without copy because no one else is probably referencing it. However, if we implement support for the DynTreeNode, we won't be able to pull off the same trick with zero copy. The node will still be referenced by its parent, so unwrap_or_clone will become costly.
Hence, I think ConcreteTreeNode will probably be a better candidate. The transition will be a bit tricky, because currently LogicalPlan stores only shared references to the objects. But I think we can update this with not a lot of API changes.
I see two options for that:
-
To me personally, it feels natural for
LogicalPlanto own its children, for example by storing them inBoxrather than inArc. Even now, if the child plan is referenced multiple times, we'll copy it on rewrite anyway, so I don't think this proposal will increase the clone rate. But on the negative side, we'll have to changeLogicalPlan,Subquery,Exprinner types and some methods... -
Alternatively, we can switch to capturing state in closures, for example:
pub enum MaybeBarren<T> {
Present(T),
// node has been destructed, but can be re-constructed when we get its children
Barren(Box<dyn FnOnce(Vec<T>) -> Result<T>>),
}
There will still be some struct and methods changes, but probably less than in the option one. But then it will be harder to implement functions like transform_down_up_with_subqueries
To me personally, it feels natural for LogicalPlan to own its children
I do think that has been brought up many times before. Now that we have a better infrastructure for rewriting plans without forcing copies I think it might make more sense than it did previously (where we had to have relatively cheap LogicalPlan::clone)