datafusion Deeply recursive UNION queries cause stack overflow

Describe the bug

In InfluxDB we sometimes create deeply nested queries like this that cause a stack overflow, even in release builds:

To Reproduce

And run

datafusion-cli -f blowout.sql

This results in

andrewlamb@Andrews-MacBook-Pro Downloads % datafusion-cli -f blowout.sql
datafusion-cli -f blowout.sql
DataFusion CLI v36.0.0

thread 'main' has overflowed its stack
fatal runtime error: stack overflow

The query looks like this

SELECT
  (SELECT 1)
  UNION ALL (SELECT 1)
  UNION ALL (SELECT 1)
  UNION ALL (SELECT 1)
.... (works with 2500 clauses, overflow with 3000)
  UNION ALL (SELECT 1)

Expected behavior

a runtime error rather than stack overflow. Bonus points if the query actually completed

Additional context

It may be possible to change the treenode recursion to use an iterative approach rather than a recursive one (aka keep a worklist), much as we did in the sql parser to protect against deeply nested expressions

Note it is important to use a release build (as the debug build fails earlier in the process)

In a release build, the stack looks like this:

<alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter 0x000000010372c65c
core::iter::adapters::try_process 0x00000001037aaffc
datafusion_common::tree_node::TreeNode::transform_up 0x0000000103818c4c
<alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter 0x000000010372c684
core::iter::adapters::try_process 0x00000001037aaffc
datafusion_common::tree_node::TreeNode::transform_up 0x0000000103818c4c
<alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter 0x000000010372c684
.... MANY REPEATS ....
datafusion_common::tree_node::TreeNode::transform_up 0x0000000103818c4c
datafusion_optimizer::analyzer::Analyzer::execute_and_check 0x00000001035ed1b0
datafusion::execution::context::SessionState::optimize 0x00000001035e487c
datafusion::dataframe::DataFrame::create_physical_plan::{{closure}} 0x0000000102e35e5c
datafusion_cli::exec::exec_and_print::{{closure}} 0x0000000102e3ffa8
datafusion_cli::exec::exec_from_lines::{{closure}} 0x0000000102e41efc
datafusion_cli::exec::exec_from_files::{{closure}} 0x0000000102e41a38
datafusion_cli::main_inner::{{closure}} 0x0000000102e5d1b4
tokio::runtime::park::CachedParkThread::block_on 0x0000000102e56f64
tokio::runtime::runtime::Runtime::block_on 0x0000000102f72e7c
datafusion_cli::main 0x0000000102f004d0
std::sys_common::backtrace::__rust_begin_short_backtrace 0x0000000102f2c4f0
std::rt::lang_start::{{closure}} 0x0000000102f5b3f0
[Inlined] core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once function.rs:284
[Inlined] std::panicking::try::do_call panicking.rs:552
[Inlined] std::panicking::try panicking.rs:516
[Inlined] std::panic::catch_unwind panic.rs:142
[Inlined] std::rt::lang_start_internal::{{closure}} rt.rs:148
[Inlined] std::panicking::try::do_call panicking.rs:552
[Inlined] std::panicking::try panicking.rs:516
[Inlined] std::panic::catch_unwind panic.rs:142
std::rt::lang_start_internal rt.rs:148
main 0x0000000102f005d8
start 0x00000001825fd0e0

Feb 27 '24 19:02 alamb

take

Oct 15 '24 17:10 blaginin

I believe there are two ways to fully resolve this issue:

Switch from the recursion to iteration like https://github.com/apache/datafusion/pull/6360 and https://github.com/apache/datafusion/pull/10023. To do this, we would need to process nodes via VecDeque or Vec. We also need to reimplement support for TreeNodeRecursion so users can still control the processing flow.
Keep the recursion but use something like https://github.com/DelSkayn/reblessive. The downside is adding another dependency, but on the plus side, it will make fixing stack overflow bugs easier in the future. That way, the current TreeNodeRecursion implementation would still work.

I will go with the first option, because it was suggested by @alamb, but open to discussing this further if needed 🙂

Oct 15 '24 17:10 blaginin

I agree -- stuff like `rebelssive' (and stacker is another) seem to me like a workaround for a more fundamental algorithmic design issue. Thank you for taking this on @blaginin

Oct 15 '24 19:10 alamb

Just wanted to share my thoughts on the task before I bombard you with PRs 😅

I think ConcreteTreeNode and DynTreeNode are the best places to write iterative algorithms. They already have methods to get and replace children, so we can flatten and update them iteratively. Re-implementing non-recursive visit/rewrite/transform methods for ConcreteTreeNode and DynTreeNode feels like a good first step to me.

Once this is done, we will need to implement either of those two traits for LogicalPlan:

Because LogicalPlan stores its children in Arcs, it feels natural to implement support for DynTreeNode. But I think this approach will be slow. Right now, when we do map_children, we destruct self before invoking the given function on the Arc::unwrap_or_clone(Arc<LogicalPlan>). This means that most likely, we just unwrap the child logical plan without copy because no one else is probably referencing it. However, if we implement support for the DynTreeNode, we won't be able to pull off the same trick with zero copy. The node will still be referenced by its parent, so unwrap_or_clone will become costly.

Hence, I think ConcreteTreeNode will probably be a better candidate. The transition will be a bit tricky, because currently LogicalPlan stores only shared references to the objects. But I think we can update this with not a lot of API changes.

I see two options for that:

To me personally, it feels natural for LogicalPlan to own its children, for example by storing them in Box rather than in Arc. Even now, if the child plan is referenced multiple times, we'll copy it on rewrite anyway, so I don't think this proposal will increase the clone rate. But on the negative side, we'll have to change LogicalPlan, Subquery, Expr inner types and some methods...
Alternatively, we can switch to capturing state in closures, for example:

pub enum MaybeBarren<T> {
    Present(T),
    // node has been destructed, but can be re-constructed when we get its children
    Barren(Box<dyn FnOnce(Vec<T>) -> Result<T>>),
}

There will still be some struct and methods changes, but probably less than in the option one. But then it will be harder to implement functions like transform_down_up_with_subqueries

Oct 19 '24 18:10 blaginin

To me personally, it feels natural for LogicalPlan to own its children

I do think that has been brought up many times before. Now that we have a better infrastructure for rewriting plans without forcing copies I think it might make more sense than it did previously (where we had to have relatively cheap LogicalPlan::clone)

Oct 22 '24 15:10 alamb