embucket-labs icon indicating copy to clipboard operation
embucket-labs copied to clipboard

engine: Vanilla one tokio runtime

Open DanCodedThis opened this issue 3 months ago • 1 comments

  • Remove dedicated_executor
  • Use only one vanilla tokio runtime, no thread pools, just the basics (for now)
  • See if it increases performance (query execution speed)
  • Investigate if it relaxes the memory problems
  • Check if it's the issue of multiple runtimes that cause the memory pools to work incorrectly
  • Gather data of how much should io-runtime vs cpu-runtime have threads for their pools is it 1/3 vs 2/3 or 1/4 / 3/4 (or others) for full saturation
  • After decide if we should introduce a second runtime and divide io vs cpu tasks

DanCodedThis avatar Sep 25 '25 14:09 DanCodedThis

So do we have some observations, on how this works ? Here are my thoughts on subject, though I didn't collect evidences confirmed by tests:

  • Without dedicated executor, it means no extra runtime created on query execute.
    • On heavy synchronous computations / I/O tasks we should use spawn_blocking
    • More responsibility on what we run and how
    • No performance increase, except we have many inter-threads communication
    • Performance can degrade if misuse (thread::sleep as example)
  • With dedicated executor (dedicated runtime per query)
    • Slight overhead (create threads memories) when we spawn separate queries
    • Main execution runtime is not blocked at all, no matter what happens in query execution
    • Easier to implement resource harness? Thought it seems datafusion already doing this on memory

@rampage644 May be interested

YaroslavLitvinov avatar Oct 14 '25 11:10 YaroslavLitvinov