river icon indicating copy to clipboard operation
river copied to clipboard

Optimize SQLite's complete happy path where setting `completed` without metadata

Open brandur opened this issue 7 months ago • 3 comments

This one follows up #870 to add an optimization for job completion where we separate out the most common case of setting jobs to completed without any metadata required and update all of them in a simplified batch query, then do the rest of the completions afterwards.

In any non-degenerate queue, most completions will be setting success states so this should help with real world uses, but it also helps us significantly improve SQLite's benchmarking numbers.

Here's a new benchmark run where throughput is ~4x what it was doing before and roughly on par with Postgres:

$ go run ./cmd/river bench --database-url "sqlite://:memory:" --num-total-jobs 1_000_000
bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
bench: jobs worked [      88218 ], inserted [          0 ], job/sec [    44109.0 ] [2s]
bench: jobs worked [      91217 ], inserted [          0 ], job/sec [    45608.5 ] [2s]
bench: jobs worked [      88858 ], inserted [          0 ], job/sec [    44429.0 ] [2s]
bench: jobs worked [      77219 ], inserted [          0 ], job/sec [    38609.5 ] [2s]
bench: jobs worked [      82045 ], inserted [          0 ], job/sec [    41022.5 ] [2s]
bench: jobs worked [      84052 ], inserted [          0 ], job/sec [    42026.0 ] [2s]
bench: jobs worked [      72028 ], inserted [          0 ], job/sec [    36014.0 ] [2s]
bench: jobs worked [      90047 ], inserted [          0 ], job/sec [    45023.5 ] [2s]
bench: jobs worked [      88875 ], inserted [          0 ], job/sec [    44437.5 ] [2s]
bench: jobs worked [      89240 ], inserted [          0 ], job/sec [    44620.0 ] [2s]
bench: jobs worked [      88842 ], inserted [          0 ], job/sec [    44421.0 ] [2s]
bench: jobs worked [      59359 ], inserted [          0 ], job/sec [    29679.5 ] [2s]
bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42822.8 ], running 23.35203575s

Here's a normal non-memory file-based database:

$ go run ./cmd/river bench --database-url "sqlite://./sqlite/bench.sqlite3" --num-total-jobs 1_000_000
bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
bench: jobs worked [      83657 ], inserted [          0 ], job/sec [    41828.5 ] [2s]
bench: jobs worked [      76648 ], inserted [          0 ], job/sec [    38324.0 ] [2s]
bench: jobs worked [      88036 ], inserted [          0 ], job/sec [    44018.0 ] [2s]
bench: jobs worked [      75473 ], inserted [          0 ], job/sec [    37736.5 ] [2s]
bench: jobs worked [      82604 ], inserted [          0 ], job/sec [    41302.0 ] [2s]
bench: jobs worked [      84048 ], inserted [          0 ], job/sec [    42024.0 ] [2s]
bench: jobs worked [      85508 ], inserted [          0 ], job/sec [    42754.0 ] [2s]
bench: jobs worked [      90580 ], inserted [          0 ], job/sec [    45290.0 ] [2s]
bench: jobs worked [      83568 ], inserted [          0 ], job/sec [    41784.0 ] [2s]
bench: jobs worked [      86062 ], inserted [          0 ], job/sec [    43031.0 ] [2s]
bench: jobs worked [      88508 ], inserted [          0 ], job/sec [    44254.0 ] [2s]
bench: jobs worked [      75308 ], inserted [          0 ], job/sec [    37654.0 ] [2s]
bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42331.9 ], running 23.622860125s

The improved benchmarks only work for fixed job burndown mode (with the --num-total-jobs option) because inserting jobs is still pretty slow because it's still done one by one.

Once again, I'm pretty sure I'll be able to land some SQLite fixes that'll make batch operations possible using json_each, and then we should be able to make all normal operations batch-wise. That'll take some time though, and we can get this optimization out in time for the initial SQLite release.

brandur avatar May 16 '25 03:05 brandur

Ah damn, I forgot that the completer tries to set specific finalized_ats per completed job. Hard to do given current sqlc limitations.

brandur avatar May 16 '25 04:05 brandur

Dang. Making sure you don't actually want me to review this or anything since you've hit a blocker.

bgentry avatar May 23 '25 20:05 bgentry

Nah, no review required for now. Thx.

brandur avatar May 24 '25 00:05 brandur