CDash icon indicating copy to clipboard operation
CDash copied to clipboard

build summary takes more than a minute to load

Open vlebourl opened this issue 1 year ago • 9 comments
trafficstars

I'm on cdash 3.2.3 with a huge database (more than 900G). Loading a build takes forever:

cdash           | [Mon Feb 26 10:36:32.160692 2024] [php:notice] [pid 282] [client 10.10.10.51:50871] [2024-02-26 10:36:32] production.WARNING: Slow page: /api/v1/buildSummary.php?buildid=1857965 took 83.14 seconds to load  \n, referer: https://cdash.starqube.com/build/1857965
$ artisan db:show

   ...........................................................................  
  Database ............................................................. cdash  
  Host .............................................................. database  
  Port .................................................................. 5432  
  Username ............................................................. cdash  
  URL ........................................................................  
  Open Connections ......................................................... 2  
  Tables ................................................................. 118  
  Total Size ................................................... 961,320.01MiB  

  Table ........................................................... Size (MiB)  
  feed .................................................................. 0.02  
  dailyupdatefile ....................................................... 0.03  
  build2group ......................................................... 154.95  
  client_library ........................................................ 0.02  
  site2user ............................................................. 0.04  
  client_jobschedule2site ............................................... 0.01  
  client_jobschedule2submission ......................................... 0.02  
  client_jobschedule2os ................................................. 0.01  
  test2image ............................................................ 0.03  
  client_jobschedule2library ............................................ 0.01  
  saml2_tenants ......................................................... 0.02  
  client_jobschedule2build .............................................. 0.01  
  version ............................................................... 0.01  
  password .............................................................. 0.01  
  client_site2project ................................................... 0.01  
  client_jobschedule2compiler ........................................... 0.01  
  builderrordiff ....................................................... 15.89  
  client_jobschedule2cmake .............................................. 0.01  
  build2note ............................................................ 0.02  
  subprojectgroup ....................................................... 0.02  
  client_jobschedule .................................................... 0.05  
  dailyupdate ........................................................... 0.52  
  build2uploadfile ...................................................... 0.02  
  jobs .................................................................. 0.10  
  testmeasurement ................................................. 138,274.70  
  coveragesummarydiff ................................................... 2.67  
  build2grouprule ....................................................... 0.12  
  client_job ............................................................ 0.04  
  coveragesummary ....................................................... 2.75  
  buildfailure ......................................................... 23.63  
  client_jobschedule2toolkit ............................................ 0.01  
  updatefile ........................................................... 98.17  
  failed_jobs ........................................................... 0.05  
  lockout ............................................................... 0.01  
  coveragefilepriority .................................................. 0.07  
  labelemail ............................................................ 0.02  
  client_compiler ....................................................... 0.02  
  label2update .......................................................... 0.01  
  successful_jobs ....................................................... 0.06  
  repositories .......................................................... 0.01  
  coveragefilelog .................................................. 17,942.82  
  coverage .......................................................... 2,027.63  
  buildfailure2argument ............................................... 328.82  
  coveragefile2user ..................................................... 0.02  
  coveragefile ........................................................ 445.54  
  client_cmake .......................................................... 0.01  
  configureerrordiff .................................................... 0.16  
  siteinformation ....................................................... 0.09  
  build2test ...................................................... 363,458.87  
  configureerror ........................................................ 2.13  
  testoutput ...................................................... 347,350.63  
  client_toolkit ........................................................ 0.02  
  label2test ....................................................... 81,120.70  
  migrations ............................................................ 0.05  
  client_toolkitconfiguration ........................................... 0.03  
  buildfailureargument .................................................. 1.11  
  build ............................................................. 1,092.21  
  banner ................................................................ 0.02  
  configure ......................................................... 2,507.82  
  subproject2build ...................................................... 0.02  
  client_toolkitconfiguration2os ........................................ 0.02  
  buildfailuredetails .................................................. 73.68  
  client_toolkitversion ................................................. 0.03  
  uploadfile ............................................................ 0.02  
  password_resets ....................................................... 0.01  
  label2dynamicanalysis ............................................... 881.68  
  test .................................................................. 0.40  
  buildemail ............................................................ 0.03  
  client_site2program ................................................... 0.02  
  label2coveragefile .................................................... 0.01  
  user .................................................................. 0.05  
  projectjobscript ...................................................... 0.02  
  label2buildfailure .................................................... 0.02  
  site .................................................................. 0.06  
  buildgroup ............................................................ 0.08  
  related_builds ........................................................ 0.02  
  label2build ........................................................... 0.02  
  apitoken .............................................................. 0.01  
  client_site2library ................................................... 0.02  
  label ................................................................. 0.18  
  project ............................................................... 0.06  
  client_site2compiler .................................................. 0.02  
  image ................................................................. 0.06  
  summaryemail .......................................................... 0.03  
  client_site2cmake ..................................................... 0.02  
  buildfile ............................................................. 3.19  
  projectrobot .......................................................... 0.02  
  filesum ............................................................... 0.02  
  measurement ........................................................... 0.05  
  user2project .......................................................... 0.09  
  authtoken ............................................................. 0.03  
  build_filters ......................................................... 0.02  
  user2repository ....................................................... 0.06  
  build2configure ...................................................... 98.25  
  subproject ............................................................ 0.04  
  client_site ........................................................... 0.03  
  build2update ......................................................... 92.41  
  buildgroupposition .................................................... 0.07  
  testdiff ............................................................. 19.89  
  errorlog .............................................................. 0.05  
  project2repositories .................................................. 0.01  
  buildinformation ..................................................... 99.74  
  overview_components ................................................... 0.05  
  dynamicanalysissummary ............................................... 16.52  
  dynamicanalysisdefect ................................................ 12.86  
  subproject2subproject ................................................. 0.02  
  client_os ............................................................. 0.04  
  userstatistics ........................................................ 2.28  
  buildnote ............................................................. 0.02  
  builderror ........................................................ 1,912.35  
  dynamicanalysis ................................................... 3,026.26  
  usertemp .............................................................. 0.02  
  pending_submissions ................................................... 2.49  
  buildproperties ....................................................... 0.02  
  note .................................................................. 0.02  
  buildtesttime ........................................................ 39.00  
  blockbuild ............................................................ 0.05  
  buildupdate ......................................................... 184.63  

Is there any way to improve this? Is this related to the database size?

Cheers

vlebourl avatar Feb 26 '24 10:02 vlebourl

A couple things to check:

  1. If you have database access, you could do a show processlist when the bulid summary page is slowly loading. That should provide some information about what queries are performing poorly. If you can share this information with us, we'll take a closer look to see if there's anything we can update in CDash to make these queries more performant.
  2. You can also do a SELECT min(starttime) FROM build to see if automatic deletion of old builds is working properly or not. If you have ancient builds in your database, you can manually trigger the autoremove script by using the php artisan build:remove all command from the root of your CDash directory. This will likely also be a very slow operation, but reclaiming disk space and reducing the size of your database should help improve CDash's performance.

zackgalbreath avatar Mar 01 '24 14:03 zackgalbreath

Here are the requests for building a summary in my case, the very long one at the end is the one taking all the time:

LOG:  statement: SELECT * FROM buildnote WHERE buildid='1861237' ORDER BY timestamp ASC
LOG:  statement: select * from "site" where "site"."id" = 87 limit 1
LOG:  statement: select count(*) as aggregate from "build2note" where "buildid" = 1861237
LOG:  statement: SELECT * FROM buildinformation WHERE buildid = '1861237'
LOG:  statement: SELECT label.id as labelid FROM label WHERE label.id IN
	                (SELECT labelid AS id FROM label2build
	                 WHERE label2build.buildid = '1861237') OR label.id IN
	                    (SELECT labelid AS id FROM label2test
	                     WHERE label2test.buildid = '1861237') OR label.id IN
	                    (SELECT labelid AS id FROM label2coveragefile
	                     WHERE label2coveragefile.buildid = '1861237') OR label.id IN (
	                    SELECT l2bf.labelid AS id
	                    FROM label2buildfailure AS l2bf
	                    LEFT JOIN buildfailure AS bf ON (bf.id=l2bf.buildfailureid)
	                    LEFT JOIN buildfailuredetails AS bfd ON (bfd.id=bf.detailsid)
	                    WHERE bfd.type='0' AND bf.buildid = '1861237') OR label.id IN (
	                    SELECT l2bf.labelid AS id
	                    FROM label2buildfailure AS l2bf
	                    LEFT JOIN buildfailure AS bf ON (bf.id=l2bf.buildfailureid)
	                    LEFT JOIN buildfailuredetails AS bfd ON (bfd.id=bf.detailsid)
	                    WHERE bfd.type='1' AND bf.buildid = '1861237') OR label.id IN
	                  (SELECT labelid AS id FROM label2dynamicanalysis l2da
	                     JOIN dynamicanalysis da ON l2da.dynamicanalysisid = da.id
	                     WHERE da.buildid = '1861237')

as per your second remark, we actually dont remove any build, but that is a policy we are enforcing and I can't change it on my own 😞 The oldest one is from 2013...

vlebourl avatar Mar 04 '24 14:03 vlebourl

What do you get when you EXPLAIN the slow query on your system? If you're using MySQL, ensure you run SET @@explain_format=TREE; to get tree structuring. Postgres does that automatically.

williamjallen avatar Mar 04 '24 14:03 williamjallen

I must confess I'm very know to postgres... Here's the raw output:

cdash=# EXPLAIN SELECT label.id as labelid FROM label WHERE label.id IN
                 (SELECT labelid AS id FROM label2build
                  WHERE label2build.buildid = '1861237') OR label.id IN
                     (SELECT labelid AS id FROM label2test
                      WHERE label2test.buildid = '1861237') OR label.id IN
                     (SELECT labelid AS id FROM label2coveragefile
                      WHERE label2coveragefile.buildid = '1861237') OR label.id IN (
                     SELECT l2bf.labelid AS id
                     FROM label2buildfailure AS l2bf
                     LEFT JOIN buildfailure AS bf ON (bf.id=l2bf.buildfailureid)
                     LEFT JOIN buildfailuredetails AS bfd ON (bfd.id=bf.detailsid)
                     WHERE bfd.type='0' AND bf.buildid = '1861237') OR label.id IN (
                     SELECT l2bf.labelid AS id
                     FROM label2buildfailure AS l2bf
                     LEFT JOIN buildfailure AS bf ON (bf.id=l2bf.buildfailureid)
                     LEFT JOIN buildfailuredetails AS bfd ON (bfd.id=bf.detailsid)
                     WHERE bfd.type='1' AND bf.buildid = '1861237') OR label.id IN
                   (SELECT labelid AS id FROM label2dynamicanalysis l2da
                      JOIN dynamicanalysis da ON l2da.dynamicanalysisid = da.id
                      WHERE da.buildid = '1861237');
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on label  (cost=12372958.93..12372996.58 rows=1167 width=8)
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2) OR (hashed SubPlan 3) OR (hashed SubPlan 4) OR (hashed SubPlan 5) OR (hashed SubPlan 6))
   SubPlan 1
     ->  Bitmap Heap Scan on label2build  (cost=4.22..14.76 rows=9 width=8)
           Recheck Cond: (buildid = '1861237'::bigint)
           ->  Bitmap Index Scan on label2build_buildid  (cost=0.00..4.22 rows=9 width=0)
                 Index Cond: (buildid = '1861237'::bigint)
   SubPlan 2
     ->  Seq Scan on label2test  (cost=0.00..12135839.14 rows=21443 width=8)
           Filter: (buildid = '1861237'::bigint)
   SubPlan 3
     ->  Seq Scan on label2coveragefile  (cost=0.00..29.62 rows=8 width=8)
           Filter: (buildid = '1861237'::bigint)
   SubPlan 4
     ->  Nested Loop  (cost=8.71..52.19 rows=1 width=8)
           ->  Hash Join  (cost=8.42..43.87 rows=1 width=16)
                 Hash Cond: (l2bf.buildfailureid = bf.id)
                 ->  Seq Scan on label2buildfailure l2bf  (cost=0.00..28.50 rows=1850 width=16)
                 ->  Hash  (cost=8.39..8.39 rows=3 width=16)
                       ->  Index Scan using buildid17 on buildfailure bf  (cost=0.29..8.39 rows=3 width=16)
                             Index Cond: (buildid = '1861237'::bigint)
           ->  Index Scan using buildfailuredetails_pkey on buildfailuredetails bfd  (cost=0.29..8.31 rows=1 width=8)
                 Index Cond: (id = bf.detailsid)
                 Filter: (type = '0'::smallint)
   SubPlan 5
     ->  Nested Loop  (cost=8.71..52.19 rows=1 width=8)
           ->  Hash Join  (cost=8.42..43.87 rows=1 width=16)
                 Hash Cond: (l2bf_1.buildfailureid = bf_1.id)
                 ->  Seq Scan on label2buildfailure l2bf_1  (cost=0.00..28.50 rows=1850 width=16)
                 ->  Hash  (cost=8.39..8.39 rows=3 width=16)
                       ->  Index Scan using buildid17 on buildfailure bf_1  (cost=0.29..8.39 rows=3 width=16)
                             Index Cond: (buildid = '1861237'::bigint)
           ->  Index Scan using buildfailuredetails_pkey on buildfailuredetails bfd_1  (cost=0.29..8.31 rows=1 width=8)
                 Index Cond: (id = bf_1.detailsid)
                 Filter: (type = '1'::smallint)
   SubPlan 6
     ->  Hash Join  (cost=420.88..236914.16 rows=1285 width=8)
           Hash Cond: (l2da.dynamicanalysisid = da.id)
           ->  Seq Scan on label2dynamicanalysis l2da  (cost=0.00..190185.40 rows=12345340 width=16)
           ->  Hash  (cost=412.66..412.66 rows=658 width=4)
                 ->  Index Scan using buildid6 on dynamicanalysis da  (cost=0.43..412.66 rows=658 width=4)
                       Index Cond: (buildid = '1861237'::bigint)
(42 rows)

vlebourl avatar Mar 04 '24 15:03 vlebourl

That's pretty useful. The sequential scans are probably the slowdown here. Since you're using Postgres, can you use EXPLAIN ANALYZE to actually run the query and report the query plan and associated costs? That will help narrow down which scan(s) in particular are problematic.

williamjallen avatar Mar 04 '24 15:03 williamjallen

cdash=# EXPLAIN ANALYZE SELECT label.id as labelid FROM label WHERE label.id IN
                 (SELECT labelid AS id FROM label2build
                  WHERE label2build.buildid = '1861237') OR label.id IN
                     (SELECT labelid AS id FROM label2test
                      WHERE label2test.buildid = '1861237') OR label.id IN
                     (SELECT labelid AS id FROM label2coveragefile
                      WHERE label2coveragefile.buildid = '1861237') OR label.id IN (
                     SELECT l2bf.labelid AS id
                     FROM label2buildfailure AS l2bf
                     LEFT JOIN buildfailure AS bf ON (bf.id=l2bf.buildfailureid)
                     LEFT JOIN buildfailuredetails AS bfd ON (bfd.id=bf.detailsid)
                     WHERE bfd.type='0' AND bf.buildid = '1861237') OR label.id IN (
                     SELECT l2bf.labelid AS id
                     FROM label2buildfailure AS l2bf
                     LEFT JOIN buildfailure AS bf ON (bf.id=l2bf.buildfailureid)
                     LEFT JOIN buildfailuredetails AS bfd ON (bfd.id=bf.detailsid)
                     WHERE bfd.type='1' AND bf.buildid = '1861237') OR label.id IN
                   (SELECT labelid AS id FROM label2dynamicanalysis l2da
                      JOIN dynamicanalysis da ON l2da.dynamicanalysisid = da.id
                      WHERE da.buildid = '1861237');
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on label  (cost=12373361.83..12373399.48 rows=1167 width=8) (actual time=80963.010..80963.593 rows=815 loops=1)
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2) OR (hashed SubPlan 3) OR (hashed SubPlan 4) OR (hashed SubPlan 5) OR (hashed SubPlan 6))
   Rows Removed by Filter: 224
   SubPlan 1
     ->  Bitmap Heap Scan on label2build  (cost=4.22..14.76 rows=9 width=8) (actual time=0.006..0.006 rows=0 loops=1)
           Recheck Cond: (buildid = '1861237'::bigint)
           ->  Bitmap Index Scan on label2build_buildid  (cost=0.00..4.22 rows=9 width=0) (actual time=0.001..0.001 rows=0 loops=1)
                 Index Cond: (buildid = '1861237'::bigint)
   SubPlan 2
     ->  Seq Scan on label2test  (cost=0.00..12136242.04 rows=21444 width=8) (actual time=52464.790..79155.376 rows=1695 loops=1)
           Filter: (buildid = '1861237'::bigint)
           Rows Removed by Filter: 643159133
   SubPlan 3
     ->  Seq Scan on label2coveragefile  (cost=0.00..29.62 rows=8 width=8) (actual time=0.004..0.004 rows=0 loops=1)
           Filter: (buildid = '1861237'::bigint)
   SubPlan 4
     ->  Nested Loop  (cost=8.71..52.19 rows=1 width=8) (actual time=0.653..0.653 rows=0 loops=1)
           ->  Hash Join  (cost=8.42..43.87 rows=1 width=16) (actual time=0.653..0.653 rows=0 loops=1)
                 Hash Cond: (l2bf.buildfailureid = bf.id)
                 ->  Seq Scan on label2buildfailure l2bf  (cost=0.00..28.50 rows=1850 width=16) (actual time=0.617..0.617 rows=1 loops=1)
                 ->  Hash  (cost=8.39..8.39 rows=3 width=16) (actual time=0.017..0.017 rows=0 loops=1)
                       Buckets: 1024  Batches: 1  Memory Usage: 8kB
                       ->  Index Scan using buildid17 on buildfailure bf  (cost=0.29..8.39 rows=3 width=16) (actual time=0.017..0.017 rows=0 loops=1)
                             Index Cond: (buildid = '1861237'::bigint)
           ->  Index Scan using buildfailuredetails_pkey on buildfailuredetails bfd  (cost=0.29..8.31 rows=1 width=8) (never executed)
                 Index Cond: (id = bf.detailsid)
                 Filter: (type = '0'::smallint)
   SubPlan 5
     ->  Nested Loop  (cost=8.71..52.19 rows=1 width=8) (actual time=0.019..0.019 rows=0 loops=1)
           ->  Hash Join  (cost=8.42..43.87 rows=1 width=16) (actual time=0.018..0.018 rows=0 loops=1)
                 Hash Cond: (l2bf_1.buildfailureid = bf_1.id)
                 ->  Seq Scan on label2buildfailure l2bf_1  (cost=0.00..28.50 rows=1850 width=16) (actual time=0.002..0.002 rows=1 loops=1)
                 ->  Hash  (cost=8.39..8.39 rows=3 width=16) (actual time=0.002..0.002 rows=0 loops=1)
                       Buckets: 1024  Batches: 1  Memory Usage: 8kB
                       ->  Index Scan using buildid17 on buildfailure bf_1  (cost=0.29..8.39 rows=3 width=16) (actual time=0.002..0.002 rows=0 loops=1)
                             Index Cond: (buildid = '1861237'::bigint)
           ->  Index Scan using buildfailuredetails_pkey on buildfailuredetails bfd_1  (cost=0.29..8.31 rows=1 width=8) (never executed)
                 Index Cond: (id = bf_1.detailsid)
                 Filter: (type = '1'::smallint)
   SubPlan 6
     ->  Hash Join  (cost=420.88..236914.16 rows=1285 width=8) (actual time=1806.427..1806.427 rows=0 loops=1)
           Hash Cond: (l2da.dynamicanalysisid = da.id)
           ->  Seq Scan on label2dynamicanalysis l2da  (cost=0.00..190185.40 rows=12345340 width=16) (actual time=0.201..1166.732 rows=12344045 loops=1)
           ->  Hash  (cost=412.66..412.66 rows=658 width=4) (actual time=0.043..0.043 rows=1 loops=1)
                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                 ->  Index Scan using buildid6 on dynamicanalysis da  (cost=0.43..412.66 rows=658 width=4) (actual time=0.038..0.038 rows=1 loops=1)
                       Index Cond: (buildid = '1861237'::bigint)
 Planning time: 1.370 ms
 Execution time: 80963.710 ms
(49 rows)

vlebourl avatar Mar 04 '24 15:03 vlebourl

Based on those results, an index on label2test and label2dynamicanalysis would probably speed things up substantially.

williamjallen avatar Mar 06 '24 15:03 williamjallen

I added both indexes and the request is now almost instantaneous. Thanks so much for your help! Shall I consider this closed or do you wanna add those indexes by default in CDash?

vlebourl avatar Mar 07 '24 14:03 vlebourl

We'd like to include these indexes in CDash, so this issue can stay open until that happens. I'll happily review any PRs you create to upstream them. Otherwise, I'll add it to my todo list.

williamjallen avatar Mar 07 '24 14:03 williamjallen

Closed by #2103 and #2104.

williamjallen avatar Apr 01 '24 13:04 williamjallen