decomp.me Try to get to bottom of outlier transactions

Overview

decompile endpoint has some crazy-long durations:

sentry

compile

sentry

TODO:

[ ] add django migration for idx_scratch_upper_name
[ ] add django migration for idx_scratch_upper_diff_label
[ ] add django migration for idx_platform_covering

May 20 '25 19:05 mkst

for decompile taking a long time, m2c is known to sometimes have tons of recursion and be very slow on some code

May 20 '25 19:05 ethteck

/api/scratch/{pk}/family

/family endpoint has been improved via https://github.com/decompme/decomp.me/pull/1559:

May 26 '25 14:05 mkst

/search

Added an index to help with the /search endpoint (was taking ~1s to perform the case-insensitive search on scratch name).

BEFORE:

>>> qs = Scratch.objects.filter(name__icontains="foo").order_by("-creation_time")[:5]
>>> print(qs.explain(analyze=True))
Limit  (cost=45254.09..45254.67 rows=5 width=1040) (actual time=1157.501..1163.304 rows=5 loops=1)
  ->  Gather Merge  (cost=45254.09..45885.76 rows=5414 width=1040) (actual time=1157.499..1163.300 rows=5 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=44254.06..44260.83 rows=2707 width=1040) (actual time=1152.080..1152.082 rows=4 loops=3)
              Sort Key: creation_time DESC
              Sort Method: top-N heapsort  Memory: 31kB
              Worker 0:  Sort Method: top-N heapsort  Memory: 31kB
              Worker 1:  Sort Method: top-N heapsort  Memory: 30kB
              ->  Parallel Seq Scan on coreapp_scratch  (cost=0.00..44209.10 rows=2707 width=1040) (actual time=296.594..1151.864 rows=28 loops=3)
                    Filter: (upper((name)::text) ~~ '%FOO%'::text)
                    Rows Removed by Filter: 54113
Planning Time: 0.894 ms
Execution Time: 1163.346 ms
>>>

Add index on the uppercase scratch name and diff_label:

>>> from django.db import connection
>>> 
>>> with connection.cursor() as cursor:
...     cursor.execute("CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_scratch_upper_name ON coreapp_scratch (UPPER(name));")
... 
>>> with connection.cursor() as cursor:
...     cursor.execute("CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_scratch_upper_diff_label ON coreapp_scratch (UPPER(diff_label));")
...

AFTER

>>> qs = Scratch.objects.filter(name__icontains="foo").order_by("-creation_time")[:5]
>>> print(qs.explain(analyze=True))
Limit  (cost=45254.14..45254.73 rows=5 width=1040) (actual time=67.494..72.637 rows=5 loops=1)
  ->  Gather Merge  (cost=45254.14..45885.82 rows=5414 width=1040) (actual time=67.493..72.633 rows=5 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=44254.12..44260.89 rows=2707 width=1040) (actual time=64.723..64.725 rows=5 loops=3)
              Sort Key: creation_time DESC
              Sort Method: top-N heapsort  Memory: 30kB
              Worker 0:  Sort Method: top-N heapsort  Memory: 33kB
              Worker 1:  Sort Method: top-N heapsort  Memory: 31kB
              ->  Parallel Seq Scan on coreapp_scratch  (cost=0.00..44209.16 rows=2707 width=1040) (actual time=12.449..64.621 rows=28 loops=3)
                    Filter: (upper((name)::text) ~~ '%FOO%'::text)
                    Rows Removed by Filter: 54114
Planning Time: 1.255 ms
Execution Time: 72.669 ms

May 28 '25 10:05 mkst

/api/platform/{id}

Huge improvement on the platform query. Pulling out the num_scratches was triggering a sequential scan over the Scratch table. Adding an index on the platform means no need to scan, also increase the per-query memory limit from 4MB to 64MB means no disk caching.

BEFORE

>>> qs = Scratch.objects.filter(platform="ps2")
>>> print(qs.explain(analyze=True))

Gather Merge  (cost=47500.29..48615.23 rows=9556 width=1040) (actual time=1200.342..1211.870 rows=11558 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Sort  (cost=46500.26..46512.21 rows=4778 width=1040) (actual time=1193.132..1194.490 rows=3853 loops=3)
        Sort Key: creation_time DESC
        Sort Method: external merge  Disk: 3832kB
        Worker 0:  Sort Method: external merge  Disk: 3800kB
        Worker 1:  Sort Method: external merge  Disk: 3832kB
        ->  Parallel Seq Scan on coreapp_scratch  (cost=0.00..44034.78 rows=4778 width=1040) (actual time=1.591..1174.221 rows=3853 loops=3)
              Filter: ((platform)::text = 'ps2'::text)
              Rows Removed by Filter: 50284
Planning Time: 0.860 ms
Execution Time: 1213.847 ms

Adding an index on scratch platform means we don't do a sequential scan

>>> with connection.cursor() as cursor:
...     # Create the new covering index (non-transactional, so avoid wrapping in large transactions)
...     cursor.execute("CREATE INDEX CONCURRENTLY idx_platform_covering ON coreapp_scratch (platform) INCLUDE (slug);")

Updating work_mem at database level:

decompme=# ALTER DATABASE decompme SET work_mem = '64MB';
ALTER DATABASE

AFTER

>>> with connection.cursor() as cursor:
...     cursor.execute("SET work_mem = '64MB';")
...     print(Scratch.objects.filter(platform="ps2").explain(analyze=True))
... 
Sort  (cost=27082.57..27111.41 rows=11537 width=1040) (actual time=32.002..33.565 rows=11558 loops=1)
  Sort Key: creation_time DESC
  Sort Method: quicksort  Memory: 16438kB
  ->  Bitmap Heap Scan on coreapp_scratch  (cost=269.83..26304.17 rows=11537 width=1040) (actual time=2.426..15.184 rows=11558 loops=1)
        Recheck Cond: ((platform)::text = 'ps2'::text)
        Heap Blocks: exact=8521
        ->  Bitmap Index Scan on idx_platform_covering  (cost=0.00..266.95 rows=11537 width=0) (actual time=1.261..1.261 rows=11558 loops=1)
              Index Cond: ((platform)::text = 'ps2'::text)
Planning Time: 0.110 ms
Execution Time: 34.612 ms

May 28 '25 10:05 mkst

insanely nice speedup 🎊

May 28 '25 11:05 ethteck

Migrating to the new, bigger + faster box you can see the average query time has reduced nicely (the migration happened around 75% of the way along the x-axis):

Jun 09 '25 21:06 mkst