chainloop icon indicating copy to clipboard operation
chainloop copied to clipboard

Add indexes to workflows and workflow runs

Open javirln opened this issue 1 year ago • 3 comments

There are some queries that can be improved in terms of performance by adding some indexes, for example the retrieval of workflow runs based on a workflow, currently, the SQL plan looks like to retrieve last 3 workflows runs of a workflow:

SELECT "workflow_runs"."id",
               "workflow_runs"."created_at",
               "workflow_runs"."finished_at",
               "workflow_runs"."state",
               "workflow_runs"."reason",
               "workflow_runs"."run_url",
               "workflow_runs"."runner_type",
               "workflow_runs"."attestation",
               "workflow_runs"."attestation_digest",
               "workflow_runs"."attestation_state",
               "workflow_runs"."contract_revision_used",
               "workflow_runs"."contract_revision_latest"
        FROM "workflow_runs"
        WHERE EXISTS (SELECT "workflows"."id"
                      FROM "workflows"
                      WHERE "workflow_runs"."workflow_workflowruns" = "workflows"."id"
                        AND "workflows"."id" = 'f09a517b-6975-489b-956e-14446b30b0d5')
        ORDER BY "workflow_runs"."created_at" DESC
        LIMIT 3
Limit  (cost=1970.54..1970.55 rows=3 width=229)
  ->  Sort  (cost=1970.54..1970.55 rows=3 width=229)
        Sort Key: workflow_runs.created_at DESC
        ->  Nested Loop  (cost=0.00..1970.52 rows=3 width=229)
              ->  Seq Scan on workflows  (cost=0.00..6.28 rows=1 width=16)
                    Filter: (id = 'f09a517b-6975-489b-956e-14446b30b0d5'::uuid)
              ->  Seq Scan on workflow_runs  (cost=0.00..1964.21 rows=3 width=245)
                    Filter: (workflow_workflowruns = 'f09a517b-6975-489b-956e-14446b30b0d5'::uuid)

Just by adding an index on the workflow_workflowruns, improves:

Limit  (cost=18.43..18.44 rows=3 width=229)
  ->  Sort  (cost=18.43..18.44 rows=3 width=229)
        Sort Key: workflow_runs.created_at DESC
        ->  Nested Loop  (cost=0.29..18.40 rows=3 width=229)
              ->  Seq Scan on workflows  (cost=0.00..6.28 rows=1 width=16)
                    Filter: (id = 'f09a517b-6975-489b-956e-14446b30b0d5'::uuid)
              ->  Index Scan using workflowrun_workflow on workflow_runs  (cost=0.29..12.10 rows=3 width=245)
                    Index Cond: (workflow_workflowruns = 'f09a517b-6975-489b-956e-14446b30b0d5'::uuid)

javirln avatar Aug 12 '24 07:08 javirln

Great find @javirln, could you point out to the code that's making these calls? I'd also like to review that those queries are properly formed. wdyt?

migmartri avatar Aug 12 '24 07:08 migmartri

btw, since foreign keys do not mean indexes, we should probably look into some other potential indexes.

Anything related to organizations should probably be indexed, especially the tables that have a lot of entries, like casbackend_mappings

make -C app/controlplane visualize-data-model 

image

should we spend some time evaluating the relationships we have here and creating indexes accordingly?

migmartri avatar Aug 12 '24 13:08 migmartri

Totally, I would go through the queries and try to identify low hanging fruits to improve

javirln avatar Aug 13 '24 06:08 javirln