graph-node Move slow db interactions to tokio blocking pool

Move slow db interactions to tokio blocking pool

Open lutter opened this issue 5 years ago • 4 comments

All database interactions going through Diesel are synchronous, even though most of them happen within tokio's cooperative multitasking framework. That means that long-running db interactions will block the tokio worker they are running on; if enough workers (by default, one per CPU core) block, the whole system grinds to a halt. At that point, no work can be done, including work that doesn't even involve the database, but simply gets scheduled on the tokio runtime.

Upstream Diesel is not going to add async database interactions anytime soon so we need to come up with our own approach. One possible way to address this is with tokio's tokio_threadpool::blocking where slow work gets pushed to a separate pool of slow tasks, freeing up the main workers.

Open questions around this:

how to best integrate that and how to change our internal API's (should most Store functions return Future?)
how to determine which db work is slow and needs to be treated that way (might be that we should do all db work on the blocking pool)

May 02 '19 15:05 lutter

What we should do is change the store to move actual work to the blocking pool itself by having some internal function Store.with_connection(f: Fn(Connection) -> Result) -> impl Future<Result> which first acquires a semaphore sized according to the max number of connections in the pool and then executes f on the blocking pool. We'd then change all Store methods that right now just get a connection to use with_connection and do their work inside that.

Once we do that, we should also remove the semaphore introduced in #1522

Mar 04 '20 19:03 lutter

We have with_conn now so the API question is solved. Maybe we don't use it everywhere we should but keeping this issue open isn't what's going to solve that.

Apr 05 '21 23:04 leoyvens

Actually we have way too much spawn_blocking around our code to consider this done. This will be done when graph::task_spawn::spawn_blocking is no longer in use, since it assumes that a future may block.

Apr 06 '21 21:04 leoyvens

cc @mangas because we discussed this during yesterday's standup.

Sep 14 '22 09:09 neysofu

graph-node graph-node copied to clipboard

Move slow db interactions to tokio blocking pool

graph-node
graph-node copied to clipboard