vortex bench spawn, spawn_blocking, unblock

i noticed we just use tokio::spawn for our spawn_cpu tasks for the builtin runtime. i threw together a simple microbench, i'm wondering if we should use either unblock/spawn_blocking instead?

Nov 04 '25 17:11 a10y

We use spawn as this follows DataFusion's CPU scheduling logic and makes most sense.

spawn_blocking isn't really for CPU heavy workloads. It's for I/O bound but blocking workloads. In other words, it's ok to have 500 threads running these tasks.

Nov 04 '25 17:11 gatesn

Codecov Report

:x: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review. :white_check_mark: Project coverage is 85.04%. Comparing base (69ef61d) to head (35af782). :warning: Report is 30 commits behind head on develop.

Files with missing lines	Patch %	Lines
vortex-io/src/runtime/tokio.rs	50.00%	1 Missing :warning:

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Nov 04 '25 17:11 codecov[bot]

spawn_blocking isn't really for CPU heavy workloads. It's for I/O bound but blocking workloads.

I thought it was explicitly for CPU work, and that's why the spawn_blocking interface takes a function and not a Future? If you spawn enough expensive work to block all the runtime threads you're going to start crushing your tail latencies won't you?

Nov 04 '25 17:11 a10y

🤷 it's a bit of both?

Tokio will spawn more blocking threads when they are requested through this function until the upper limit configured on the Builder is reached. This limit is very large by default, because spawn_blocking is often used for various kinds of IO operations that cannot be performed asynchronously. When you run CPU-bound code using spawn_blocking, you should keep this large upper limit in mind; to run your CPU-bound computations on only a few threads, you should use a separate thread pool such as rayon rather than configuring the number of blocking threads.

That's why I kept them separate in the Vortex Handle interface, so we could route them to different locations later. I wouldn't be against e.g. TokioRuntime holding an optional second handle to a different runtime / thread pool that we use for CPU tasks.

Nov 04 '25 18:11 gatesn

I setup a simple test case, streaming hits_83.parquet from file through the writer into a Vortex file in tmpfs https://github.com/vortex-data/vortex/pull/5187/files#diff-8d942868d0453754f6247ebafffd3ea2c08c5c051dbe7c9536619c026cd9548d

There were 3 knobs:

File system: S3 (~100ms avg latency from my laptop) vs local FS
Use of spawn vs unblock for executing the CPU tasks
Number of Tokio worker threads (1, 2, 4, 8)

The results for the S3 test:

In chart form

Results for the Local FS test:

In chart form: performance_plot_fs

For the smaller worker thread counts, the unblock version performs considerably better, which isn’t too surprising because that is the situation where you’re likely to get worker thread starvation from the CPU tasks.

With some println debugging, it appears that the CPU tasks spawned inside of the CompressingStrategy can be > 5ms to compress a chunk, which in the spawn mode is time spent blocking the executor, and we spawn each chunk for each column, so the number of blocking tasks grows based on the size of the schema.

The performance difference goes away with larger thread counts, but that doesn’t mean it’s not structurally present, just that we haven’t thrown a workload at it where however many worker thread Tokio spawns by default can’t handle it.

Nov 04 '25 22:11 a10y