Try to get to bottom of outlier transactions
Overview
decompile endpoint has some crazy-long durations:
compile
TODO:
- [ ] add django migration for
idx_scratch_upper_name - [ ] add django migration for
idx_scratch_upper_diff_label - [ ] add django migration for
idx_platform_covering
for decompile taking a long time, m2c is known to sometimes have tons of recursion and be very slow on some code
/api/scratch/{pk}/family
/family endpoint has been improved via https://github.com/decompme/decomp.me/pull/1559:
/search
Added an index to help with the /search endpoint (was taking ~1s to perform the case-insensitive search on scratch name).
BEFORE:
>>> qs = Scratch.objects.filter(name__icontains="foo").order_by("-creation_time")[:5]
>>> print(qs.explain(analyze=True))
Limit (cost=45254.09..45254.67 rows=5 width=1040) (actual time=1157.501..1163.304 rows=5 loops=1)
-> Gather Merge (cost=45254.09..45885.76 rows=5414 width=1040) (actual time=1157.499..1163.300 rows=5 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=44254.06..44260.83 rows=2707 width=1040) (actual time=1152.080..1152.082 rows=4 loops=3)
Sort Key: creation_time DESC
Sort Method: top-N heapsort Memory: 31kB
Worker 0: Sort Method: top-N heapsort Memory: 31kB
Worker 1: Sort Method: top-N heapsort Memory: 30kB
-> Parallel Seq Scan on coreapp_scratch (cost=0.00..44209.10 rows=2707 width=1040) (actual time=296.594..1151.864 rows=28 loops=3)
Filter: (upper((name)::text) ~~ '%FOO%'::text)
Rows Removed by Filter: 54113
Planning Time: 0.894 ms
Execution Time: 1163.346 ms
>>>
Add index on the uppercase scratch name and diff_label:
>>> from django.db import connection
>>>
>>> with connection.cursor() as cursor:
... cursor.execute("CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_scratch_upper_name ON coreapp_scratch (UPPER(name));")
...
>>> with connection.cursor() as cursor:
... cursor.execute("CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_scratch_upper_diff_label ON coreapp_scratch (UPPER(diff_label));")
...
AFTER
>>> qs = Scratch.objects.filter(name__icontains="foo").order_by("-creation_time")[:5]
>>> print(qs.explain(analyze=True))
Limit (cost=45254.14..45254.73 rows=5 width=1040) (actual time=67.494..72.637 rows=5 loops=1)
-> Gather Merge (cost=45254.14..45885.82 rows=5414 width=1040) (actual time=67.493..72.633 rows=5 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=44254.12..44260.89 rows=2707 width=1040) (actual time=64.723..64.725 rows=5 loops=3)
Sort Key: creation_time DESC
Sort Method: top-N heapsort Memory: 30kB
Worker 0: Sort Method: top-N heapsort Memory: 33kB
Worker 1: Sort Method: top-N heapsort Memory: 31kB
-> Parallel Seq Scan on coreapp_scratch (cost=0.00..44209.16 rows=2707 width=1040) (actual time=12.449..64.621 rows=28 loops=3)
Filter: (upper((name)::text) ~~ '%FOO%'::text)
Rows Removed by Filter: 54114
Planning Time: 1.255 ms
Execution Time: 72.669 ms
/api/platform/{id}
Huge improvement on the platform query. Pulling out the num_scratches was triggering a sequential scan over the Scratch table. Adding an index on the platform means no need to scan, also increase the per-query memory limit from 4MB to 64MB means no disk caching.
BEFORE
>>> qs = Scratch.objects.filter(platform="ps2")
>>> print(qs.explain(analyze=True))
Gather Merge (cost=47500.29..48615.23 rows=9556 width=1040) (actual time=1200.342..1211.870 rows=11558 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=46500.26..46512.21 rows=4778 width=1040) (actual time=1193.132..1194.490 rows=3853 loops=3)
Sort Key: creation_time DESC
Sort Method: external merge Disk: 3832kB
Worker 0: Sort Method: external merge Disk: 3800kB
Worker 1: Sort Method: external merge Disk: 3832kB
-> Parallel Seq Scan on coreapp_scratch (cost=0.00..44034.78 rows=4778 width=1040) (actual time=1.591..1174.221 rows=3853 loops=3)
Filter: ((platform)::text = 'ps2'::text)
Rows Removed by Filter: 50284
Planning Time: 0.860 ms
Execution Time: 1213.847 ms
Adding an index on scratch platform means we don't do a sequential scan
>>> with connection.cursor() as cursor:
... # Create the new covering index (non-transactional, so avoid wrapping in large transactions)
... cursor.execute("CREATE INDEX CONCURRENTLY idx_platform_covering ON coreapp_scratch (platform) INCLUDE (slug);")
Updating work_mem at database level:
decompme=# ALTER DATABASE decompme SET work_mem = '64MB';
ALTER DATABASE
AFTER
>>> with connection.cursor() as cursor:
... cursor.execute("SET work_mem = '64MB';")
... print(Scratch.objects.filter(platform="ps2").explain(analyze=True))
...
Sort (cost=27082.57..27111.41 rows=11537 width=1040) (actual time=32.002..33.565 rows=11558 loops=1)
Sort Key: creation_time DESC
Sort Method: quicksort Memory: 16438kB
-> Bitmap Heap Scan on coreapp_scratch (cost=269.83..26304.17 rows=11537 width=1040) (actual time=2.426..15.184 rows=11558 loops=1)
Recheck Cond: ((platform)::text = 'ps2'::text)
Heap Blocks: exact=8521
-> Bitmap Index Scan on idx_platform_covering (cost=0.00..266.95 rows=11537 width=0) (actual time=1.261..1.261 rows=11558 loops=1)
Index Cond: ((platform)::text = 'ps2'::text)
Planning Time: 0.110 ms
Execution Time: 34.612 ms
insanely nice speedup 🎊
Migrating to the new, bigger + faster box you can see the average query time has reduced nicely (the migration happened around 75% of the way along the x-axis):