squad
squad copied to clipboard
TestComparison: apply transitions while fetching tests
TestComparison is the last bottleneck we currently have. It:
- Causes workers to be killed due to OOM, which
1.1 Causes
ProjectStatus.create_or_update
fail, therefore causingcelery_chord
in tradefed plugin fail as well - Causes Build Comparison timeout
- And I'm sure it causes some serious delay when generating
Notification
objects, because a comparison object is given to it
The main source of this problem is that we load all tests in memory to then apply transitions (pass->fail, fail->pass, etc), more details here. Bottom line, in most cases, we only need to load a tiny portion of tests that are actually useful.
I want to make TestComparison
in a way that we discard tests that don't fit wanted transitions on the fly. I still don't know how yet :)
For future references:
I used two builds in staging baseline and target to use as PoC that using build reference in Test table would make things significantly faster. Those builds contain a single testrun only (2840382 and 2840381), each containing 1.3M+ tests.
Normally a build contains many more testruns and there's where the problem relies: we have no direct way of comparing tests from two different builds without joining the testrun table. By adding build_id to Test, we could use build_id directly in the snippet below, instead of testrun_id.
Currently it takes several minutes (maybe 5, maybe 10 depending on load) and almost 5G of RAM just to run ProjectStatus.create_or_update
.
I was able to run the same "query", to get fixes and regressions with the snippet below:
baseline_tr = 2840382
target_tr = 2840381
tests = Test.objects.raw('SELECT t1.* FROM core_test t1, core_test t2 WHERE t1.test_run_id = %d AND t2.test_run_id = %d AND t1.metadata_id = t2.metadata_id AND t1.result != t2.result', [baseline_tr, target_tr])
# The query above will only return baseline tests which had different results if compared to the target tests
# this means that a baseline test with result=False is returned, the same test in the target test has to have result=True
# therefore, this is considered a regression. A fix is the other way around (I'm not taking into account intermittent tests yet)
# regressions_and_fixes[False] -> regressions
# regressions_and_fixes[True] -> fixes
regressions_and_fixes = {True: [], False: []}
for test in tests:
regressions_and_fixes[test.result].append(test)
Running this snippet takes about 5 seconds and used 22MB of disk space to sort things out:
stagingqareports=> explain analyze SELECT t1.*
FROM
core_test t1,
core_test t2
WHERE
t1.test_run_id = 2840382 AND
t2.test_run_id = 2840381 AND
t1.metadata_id = t2.metadata_id AND
t1.result != t2.result;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=11417.03..12082.59 rows=2134 width=199) (actual time=3747.691..3756.845 rows=1 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Merge Join (cost=10417.03..10869.19 rows=1255 width=199) (actual time=3048.631..3734.013 rows=0 loops=2)
Merge Cond: (t1.metadata_id = t2.metadata_id)
Join Filter: (t1.result <> t2.result)
Rows Removed by Join Filter: 664846
-> Sort (cost=5531.46..5581.40 rows=19976 width=199) (actual time=827.639..1089.421 rows=664847 loops=2)
Sort Key: t1.metadata_id
Sort Method: external merge Disk: 23512kB
Worker 0: Sort Method: external merge Disk: 22488kB
-> Parallel Index Scan using core_test_ba18909e on core_test t1 (cost=0.57..2190.07 rows=19976 width=199) (actual time=0.017..348.175 rows=664847 loops=2)
Index Cond: (test_run_id = 2840382)
-> Sort (cost=4885.58..4970.47 rows=33959 width=5) (actual time=1518.442..1943.952 rows=1329638 loops=2)
Sort Key: t2.metadata_id
Sort Method: external sort Disk: 24832kB
Worker 0: Sort Method: external sort Disk: 24832kB
-> Index Scan using core_test_ba18909e on core_test t2 (cost=0.57..2329.90 rows=33959 width=5) (actual time=0.033..625.049 rows=1329693 loops=2)
Index Cond: (test_run_id = 2840381)
Planning Time: 0.284 ms
Execution Time: 3770.148 ms
(21 rows)
This timing can be improved if we defined work_mem=32MB
in postgres, but I don't want to go there yet.
NOTE: Getting other transitions, e.g. pass -> n/a etc would require other query designs, but I think it would still be very fast and pagination would be much easier.
I slept on it and I also think having the environment reference to test table is fundamental to make correct comparisons. There are cases where the same test is run multiple times for different environments.
There are also cases where same test is run multiple times for the same environment. I think this can be solved with the "confidence result" that @mwasilew suggested a while ago.