chainloop Add indexes to workflows and workflow runs

There are some queries that can be improved in terms of performance by adding some indexes, for example the retrieval of workflow runs based on a workflow, currently, the SQL plan looks like to retrieve last 3 workflows runs of a workflow:

SELECT "workflow_runs"."id",
               "workflow_runs"."created_at",
               "workflow_runs"."finished_at",
               "workflow_runs"."state",
               "workflow_runs"."reason",
               "workflow_runs"."run_url",
               "workflow_runs"."runner_type",
               "workflow_runs"."attestation",
               "workflow_runs"."attestation_digest",
               "workflow_runs"."attestation_state",
               "workflow_runs"."contract_revision_used",
               "workflow_runs"."contract_revision_latest"
        FROM "workflow_runs"
        WHERE EXISTS (SELECT "workflows"."id"
                      FROM "workflows"
                      WHERE "workflow_runs"."workflow_workflowruns" = "workflows"."id"
                        AND "workflows"."id" = 'f09a517b-6975-489b-956e-14446b30b0d5')
        ORDER BY "workflow_runs"."created_at" DESC
        LIMIT 3

Limit  (cost=1970.54..1970.55 rows=3 width=229)
  ->  Sort  (cost=1970.54..1970.55 rows=3 width=229)
        Sort Key: workflow_runs.created_at DESC
        ->  Nested Loop  (cost=0.00..1970.52 rows=3 width=229)
              ->  Seq Scan on workflows  (cost=0.00..6.28 rows=1 width=16)
                    Filter: (id = 'f09a517b-6975-489b-956e-14446b30b0d5'::uuid)
              ->  Seq Scan on workflow_runs  (cost=0.00..1964.21 rows=3 width=245)
                    Filter: (workflow_workflowruns = 'f09a517b-6975-489b-956e-14446b30b0d5'::uuid)

Just by adding an index on the workflow_workflowruns, improves:

Limit  (cost=18.43..18.44 rows=3 width=229)
  ->  Sort  (cost=18.43..18.44 rows=3 width=229)
        Sort Key: workflow_runs.created_at DESC
        ->  Nested Loop  (cost=0.29..18.40 rows=3 width=229)
              ->  Seq Scan on workflows  (cost=0.00..6.28 rows=1 width=16)
                    Filter: (id = 'f09a517b-6975-489b-956e-14446b30b0d5'::uuid)
              ->  Index Scan using workflowrun_workflow on workflow_runs  (cost=0.29..12.10 rows=3 width=245)
                    Index Cond: (workflow_workflowruns = 'f09a517b-6975-489b-956e-14446b30b0d5'::uuid)

Aug 12 '24 07:08 javirln

Great find @javirln, could you point out to the code that's making these calls? I'd also like to review that those queries are properly formed. wdyt?

Aug 12 '24 07:08 migmartri

btw, since foreign keys do not mean indexes, we should probably look into some other potential indexes.

Anything related to organizations should probably be indexed, especially the tables that have a lot of entries, like casbackend_mappings

make -C app/controlplane visualize-data-model

should we spend some time evaluating the relationships we have here and creating indexes accordingly?

Aug 12 '24 13:08 migmartri

Totally, I would go through the queries and try to identify low hanging fruits to improve

Aug 13 '24 06:08 javirln

chainloop chainloop copied to clipboard

Add indexes to workflows and workflow runs

chainloop
chainloop copied to clipboard