embucket-labs
embucket-labs copied to clipboard
engine: Vanilla one tokio runtime
- Remove
dedicated_executor - Use only one vanilla tokio runtime, no thread pools, just the basics (for now)
- See if it increases performance (query execution speed)
- Investigate if it relaxes the memory problems
- Check if it's the issue of multiple runtimes that cause the memory pools to work incorrectly
- Gather data of how much should io-runtime vs cpu-runtime have threads for their pools is it 1/3 vs 2/3 or 1/4 / 3/4 (or others) for full saturation
- After decide if we should introduce a second runtime and divide io vs cpu tasks
So do we have some observations, on how this works ? Here are my thoughts on subject, though I didn't collect evidences confirmed by tests:
- Without dedicated executor, it means no extra runtime created on query execute.
- On heavy synchronous computations / I/O tasks we should use spawn_blocking
- More responsibility on what we run and how
- No performance increase, except we have many inter-threads communication
- Performance can degrade if misuse (thread::sleep as example)
- With dedicated executor (dedicated runtime per query)
- Slight overhead (create threads memories) when we spawn separate queries
- Main execution runtime is not blocked at all, no matter what happens in query execution
- Easier to implement resource harness? Thought it seems datafusion already doing this on memory
@rampage644 May be interested