tiled icon indicating copy to clipboard operation
tiled copied to clipboard

Make sqlalchemy use connection pooling for catalog db

Open nmaytan opened this issue 1 year ago • 4 comments

This PR results from the investigation done in #663, as well as a group coding session to implement a new benchmark for the catalog db (especially with input from @genematx, @Kezzsim, @danielballan).

I spent some extra time experimenting with asv to understand better what it does before putting up this PR. The new benchmark can be singled out by running: asv run --environment existing:same --bench "time_repeated_write".

While some initial benchmarks seemed to show a modest performance improvement from connection pooling, I found (after running additional benchmarks) that this improvement was really just variance in the benchmark results. Instead, benchmarks seem to show that performance is ~identical regardless of connection pooling. This at least means that there is no apparent regression, which was our primary concern.

asv has several parameters that can adjust how benchmarks are taken, with the ability to customize things like warmup time, rounds per benchmark, samples per round, benchmark timeouts, etc. I played with these settings to see if they had any impact on our results, and it seemed the answer is "no".

It is worth pointing out that asv defaults to timeit.default_timer - and so it should measure the time spent in subprocesses. I was worried about this, since aiosqlite drops into a thread (per db connection).

nmaytan avatar Apr 05 '24 16:04 nmaytan

For test failures, it looks like you are missing an import:

        engine = create_async_engine(
>           uri, echo=echo, json_serializer=json_serializer, poolclass=AsyncAdaptedQueuePool
        )
E       NameError: name 'AsyncAdaptedQueuePool' is not defined

For linter issues:

pip install pre-commit
pre-commit install  # set up git hooks
pre-commit run --all-files
git commit -am "Fix linter issues."

danielballan avatar Apr 05 '24 16:04 danielballan

Sorry, silly blunder.

nmaytan avatar Apr 05 '24 17:04 nmaytan

I see some pytest failures. They look related but it is not immediately obvious to me what broke.

danielballan avatar Apr 05 '24 19:04 danielballan

Correction: I had misunderstood that asv was running against the version of tiled installed in my environment, and not against the local repository I was working in. Now performing proper benchmarks where the change between pooling / not pooling is actually captured, the speedup looks significant when using pooling:

Pool Run Time Uncertainty Unit
Yes 1 2.69 0.02 s
Yes 2 2.84 0.2 s
Yes 3 2.94 0.05 s
No 4 5.17 0.07 s
No 5 5.26 0.07 s
No 6 5.31 0.1 s
Yes 7 2.87 0.02 s
Yes 8 3.04 0.1 s
Yes 9 2.77 0.02 s
No 10 4.84 0 s
No 11 5.18 0.1 s
No 12 5.41 0.01 s

edit: this improvement scales, here is a run for range(1000) rather than 100:

Pool Run Time Uncertainty Unit
No 1 failed
Yes 2 25.8 0.3 s
No 3 54.5 0.3 s

The first run was too slow for the default timeout, and I had to increase it:

asv run --environment existing:same --bench "time_repeated_write" -a timeout=300

nmaytan avatar Apr 05 '24 22:04 nmaytan

The test failures arose from trying to use a pool with an in-memory (:memory:) SQLite database. Making the pool conditional resolved the problem in a commit pushed above. With this change in place, I re-verified the benchmarks, before and after this diff:

diff --git a/tiled/catalog/adapter.py b/tiled/catalog/adapter.py
index e1c2f1af..b297155d 100644
--- a/tiled/catalog/adapter.py
+++ b/tiled/catalog/adapter.py
@@ -1293,7 +1293,7 @@ def from_uri(
         uri = f"sqlite+aiosqlite:///{uri}"
 
     parsed_url = make_url(uri)
-    if (parsed_url.get_dialect().name == "sqlite") and (
+    if False and (parsed_url.get_dialect().name == "sqlite") and (
         parsed_url.database != ":memory:"
     ):
         # For file-backed SQLite databases, connection pooling offers a
Pool Run Time Uncertainty Unit
Yes 1 1.36 0.01 s
Yes 2 1.51 0.04 s
Yes 3 1.43 0.01 s
No 4 2.04 0.05 s
No 5 1.97 0.04 s
No 6 2.07 0.01 s

danielballan avatar Apr 25 '24 21:04 danielballan