pact_broker icon indicating copy to clipboard operation
pact_broker copied to clipboard

Performance of Show Latest Tags page

Open ertrzyiks opened this issue 3 years ago • 15 comments

When I go to Show Latest Tags broker UI takes a lot of time to load and may crash with out of memory error.

Screenshot 2020-11-27 at 16 56 38

matrix page hide all tags page show all tags page
500ms 140ms 3.4s

The SQL log shows that it performs

SELECT * FROM "latest_verifications_for_consumer_version_tags"

and this table has 18000 rows on my machine at the moment.

If I get it right it comes from this part

https://github.com/pact-foundation/pact_broker/blob/93a199826e6251a7f2ca77b4702f69c6f8d8ae06/lib/pact_broker/index/service.rb#L172-L187

when we have query param tags=true.

Pre issue-raising checklist

  • [x] Upgraded to the latest Pact Broker OR
  • [ ] Checked the CHANGELOG to see if the issue I am about to raise has been fixed
  • [ ] Created an executable example that demonstrates the issue using either a:
    • Dockerfile
    • Git repository with a Travis or Appveyor (or similar) build

Software versions

  • pact-broker gem version: n/a
  • pact-broker docker version: pactfoundation/pact-broker:2.69.0.0
  • OS: e.g. Mac OSX 10.15.4
  • pact broker client details: n/a

Expected behaviour

Performance of the tags page http://localhost:9292/?tags=true does not depend on number of pacts in DB.

Actual behaviour

If we have a lot of contracts in the database loading http://localhost:9292/?tags=true takes a lot of time or crashes server with out of memory error.

Steps to reproduce

Fill broker with a lot of version and tags and visit http://localhost:9292/?tags=true.

Relevant log files

pact-broker_1  | 2020-11-27 15:54:05.285692 D [8:puma 002] pact-broker -- (0.000447s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:05.636016 D [8:puma 002] pact-broker -- (0.340052s) SELECT * FROM "latest_verifications_for_consumer_version_tags"
pact-broker_1  | 2020-11-27 15:54:07.176131 D [8:puma 002] pact-broker -- (0.000513s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.176691 D [8:puma 002] pact-broker -- (0.000415s) SELECT * FROM "versions" WHERE ("versions"."id" IN (62))
pact-broker_1  | 2020-11-27 15:54:07.182256 D [8:puma 002] pact-broker -- (0.000374s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.195041 D [8:puma 002] pact-broker -- (0.012675s) SELECT "id" FROM "latest_pact_publications"
pact-broker_1  | 2020-11-27 15:54:07.195538 D [8:puma 002] pact-broker -- (0.000246s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.196160 D [8:puma 002] pact-broker -- (0.000460s) SELECT DISTINCT "consumer_id", "provider_id" FROM "webhooks"
pact-broker_1  | 2020-11-27 15:54:07.197281 D [8:puma 002] pact-broker -- (0.000355s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.347075 D [8:puma 002] pact-broker -- (0.149616s) SELECT count(*) AS "count" FROM (SELECT "id", "consumer_name", "provider_name", "consumer_version_order" FROM "latest_pact_publications" UNION (SELECT "id", "consumer_name", "provider_name", "consumer_version_order" FROM "latest_tagged_pact_publications")) AS "t1" LIMIT 1
pact-broker_1  | 2020-11-27 15:54:07.347969 D [8:puma 002] pact-broker -- (0.000295s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.526938 D [8:puma 002] pact-broker -- (0.178866s) SELECT "pact_publications".* FROM "pact_publications" WHERE ("id" IN (SELECT "id" FROM (SELECT "id", "consumer_name", "provider_name", "consumer_version_order" FROM "latest_pact_publications" UNION (SELECT "id", "consumer_name", "provider_name", "consumer_version_order" FROM "latest_tagged_pact_publications")) AS "t1" ORDER BY lower("consumer_name") ASC, "consumer_version_order" DESC, lower("provider_name") ASC LIMIT 30 OFFSET 18240))
pact-broker_1  | 2020-11-27 15:54:07.529363 D [8:puma 002] pact-broker -- (0.000234s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.530428 D [8:puma 002] pact-broker -- (0.000877s) SELECT * FROM "pacticipants" WHERE ("pacticipants"."id" IN (18))
pact-broker_1  | 2020-11-27 15:54:07.531182 D [8:puma 002] pact-broker -- (0.000227s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.531842 D [8:puma 002] pact-broker -- (0.000507s) SELECT * FROM "pacticipants" WHERE ("pacticipants"."id" IN (19))
pact-broker_1  | 2020-11-27 15:54:07.532327 D [8:puma 002] pact-broker -- (0.000148s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.533193 D [8:puma 002] pact-broker -- (0.000757s) SELECT * FROM "pact_versions" WHERE ("pact_versions"."id" IN (41))
pact-broker_1  | 2020-11-27 15:54:07.533907 D [8:puma 002] pact-broker -- (0.000214s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.540767 D [8:puma 002] pact-broker -- (0.006799s) SELECT * FROM "integrations" WHERE (("integrations"."consumer_id", "integrations"."provider_id") IN ((18, 19)))
pact-broker_1  | 2020-11-27 15:54:07.541277 D [8:puma 002] pact-broker -- (0.000155s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.542126 D [8:puma 002] pact-broker -- (0.000773s) SELECT * FROM "latest_verifications_for_consumer_and_provider" WHERE (("latest_verifications_for_consumer_and_provider"."consumer_id", "latest_verifications_for_consumer_and_provider"."provider_id") IN ((18, 19)))
pact-broker_1  | 2020-11-27 15:54:07.542720 D [8:puma 002] pact-broker -- (0.000150s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.543157 D [8:puma 002] pact-broker -- (0.000296s) SELECT * FROM "versions" WHERE ("versions"."id" IN (62))
pact-broker_1  | 2020-11-27 15:54:07.544312 D [8:puma 002] pact-broker -- (0.000150s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.551152 D [8:puma 002] pact-broker -- (0.006763s) SELECT * FROM "latest_triggered_webhooks" WHERE (("latest_triggered_webhooks"."consumer_id", "latest_triggered_webhooks"."provider_id") IN ((18, 19)))
pact-broker_1  | 2020-11-27 15:54:07.551736 D [8:puma 002] pact-broker -- (0.000167s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.552296 D [8:puma 002] pact-broker -- (0.000366s) SELECT * FROM "versions" WHERE ("versions"."id" IN (524, 531, 540, 514, 511, 536, 516, 534, 520, 525, 523, 512, 519, 522, 535, 528, 533, 518, 513, 530, 515, 517, 521, 532, 527, 539, 526, 529, 537, 538))
pact-broker_1  | 2020-11-27 15:54:07.558233 D [8:puma 002] pact-broker -- (0.000343s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.559132 D [8:puma 002] pact-broker -- (0.000785s) SELECT * FROM "latest_verifications_for_pact_versions" WHERE ("latest_verifications_for_pact_versions"."pact_version_id" IN (41))
pact-broker_1  | 2020-11-27 15:54:07.559713 D [8:puma 002] pact-broker -- (0.000232s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.560273 D [8:puma 002] pact-broker -- (0.000433s) SELECT * FROM "versions" WHERE ("versions"."id" IN (62))
pact-broker_1  | 2020-11-27 15:54:07.561081 D [8:puma 002] pact-broker -- (0.000288s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.576066 D [8:puma 002] pact-broker -- (0.014893s) SELECT * FROM "tags_with_latest_flag" WHERE ("tags_with_latest_flag"."version_id" IN (62))
pact-broker_1  | 2020-11-27 15:54:07.576886 D [8:puma 002] pact-broker -- (0.000347s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.596826 D [8:puma 002] pact-broker -- (0.019782s) SELECT * FROM "head_pact_tags" WHERE ("head_pact_tags"."pact_publication_id" IN (515, 522, 531, 505, 502, 527, 507, 525, 511, 516, 514, 503, 510, 513, 526, 519, 524, 509, 504, 521, 506, 508, 512, 523, 518, 530, 517, 520, 528, 529))
pact-broker_1  | 2020-11-27 15:54:07.644012 D [8:puma 002] pact-broker -- (0.000454s) SELECT NULL
pact-broker_1  | 2020-11-27 15:54:07.658064 D [8:puma 002] pact-broker -- (0.013951s) SELECT * FROM "tags_with_latest_flag" WHERE ("tags_with_latest_flag"."version_id" = 62)

ertrzyiks avatar Nov 27 '20 16:11 ertrzyiks

Hi @ertrzyiks On what hardware are you running the broker and the database?

anto-ac avatar Nov 27 '20 16:11 anto-ac

Hi @anto-ac , we run broker on machine with 2gb of ram and the container is killed with OOMKilled reason when we visit ?tags=true page. Database is postgres.

ertrzyiks avatar Nov 27 '20 17:11 ertrzyiks

@ertrzyiks Does the database run as a container on the same machine as the broker or is it a separate instance?

anto-ac avatar Nov 27 '20 22:11 anto-ac

Ok, I've been working on a clean up feature that will run at regular intervals to remove unnecessary data. I was hoping to get it out this weekend, but I've been testing it on Anto's data and it can't cope! I think I'm going to change my approach to something more incremental. There is a "blunt force" manual clean up script here that will help you in the mean time.

bethesque avatar Nov 28 '20 04:11 bethesque

When I do clean-up using the blunt force script I have to do the last delete (delete versions) in batches, otherwise the query basically never ends. 😅

anto-ac avatar Nov 28 '20 06:11 anto-ac

@anto-ac database is a separate instance from cloudsql.

@bethesque I was curious if we ever need old data, like tags related to branches that were already merged. We never remove such. As you mention a clean up script you also think the broker should only store the most recent data, right?

ertrzyiks avatar Nov 28 '20 07:11 ertrzyiks

You can always get broker data back by re-running builds. I think this solution will work for you though, because it lets you keep the "head" versions (the latest version for each pacticipant/tag) regardless of age, and it's rare than you ever need a version from a feature branch that is not the latest one for its tag.

I'd love you guys to beta test this out for me with your own data set in the postgres database: https://github.com/pact-foundation/pact-broker-docker/blob/master/docker-compose-dev.yml

These are the relevant vars. You can play around with them to see what works for you.

      PACT_BROKER_DATABASE_BETA_CLEAN_ENABLED: "true"
      PACT_BROKER_DATABASE_CLEAN_CRON_SCHEDULE: "* * * * *"
      PACT_BROKER_DATABASE_CLEAN_DRY_RUN: "false"
      PACT_BROKER_DATABASE_CLEAN_VERSION_DELETION_LIMIT: "500"
      # Keep all prod versions, AND the latest version for every pacticipant/tag, and all versions less than 30 days old
      PACT_BROKER_DATABASE_CLEAN_KEEP_VERSION_SELECTORS: '[{"tag":"production"}, {"latest": true, "tag": true}, {"max_age": 30}]'

When ${PACT_BROKER_DATABASE_BETA_CLEAN_ENABLED} is true, a cron job will run on the schedule that you configure, that takes the oldest $PACT_BROKER_DATABASE_CLEAN_VERSION_DELETION_LIMIT versions (excluding the ones that you want to keep, configured in $PACT_BROKER_DATABASE_CLEAN_KEEP_VERSION_SELECTORS), and deletes all data associated with them. I've found that using Anto's redacted dataset on my Macbook pro, deleting 500 pacticipant versions worth of data took about 35 seconds. The recommendation for really big datasets would be to set a schedule like "delete 500 versions every 5 minutes" overnight, and by morning the database should only contain the stuff you want. Then you could dial it back to running once a night.

The dry run feature allows you to see what would be deleted if you had it enabled. Obviously, it'll print the same thing over and over again until you actually enable it, but I hope it helps give you confidence to turn the feature on for reals. At the moment it just lists what is being deleted, but I want to add a summary of what is being kept as well, so you can be doubly confident that, for example, all your prod versions aren't going to disappear.

Let me know how you go, and whether or not you think this will solve the data problem for you both, and if you have a wish list for any additions. On the todo list is to also be able to get rid of a bunch of the triggered webhook data, as you generally only need the last few executions.

bethesque avatar Nov 28 '20 10:11 bethesque

Oh, the things you can put into a selector are:

  • pacticipant: "name string"
  • latest: true
  • tag: "name string" or true
  • max_age: number of days

They can be used in any combination, except latest and max_age are mutually exclusive.

Examples:

  • "all production versions" { "tag" : " production" }
  • "latest version for each pacticipant" { "latest": true }
  • "latest version for each pacticipant/tag" { "latest": true, "tag": true }
  • "all versions less than 30 days old" {"max_age": 30}
  • "all versions for Foo app" {"pacticipant": "Foo"}
  • "all main versions for Foo app" {"pacticipant": "Foo", "tag": "main"}
  • "all main versions less than 30 days old" {"max_age": 30, "tag": "main"}

The selectors combine by AND, so [{"max_age": 30}, { "latest": true, tag: true }, { "tag": "production" }] would be "keep everything that's less than 30 days old AND keep the latest version for every pacticipant/tag AND keep all the production versions".

bethesque avatar Nov 28 '20 10:11 bethesque

Thanks a lot @bethesque , that's very helpful 👍

ertrzyiks avatar Nov 30 '20 12:11 ertrzyiks

First of all, @bethesque, thank you for this clean up procedure. It was something very much needed 🙇.

The selectors combine by AND, so [{"max_age": 30}, { "latest": true, tag: true }, { "tag": "production" }] would be "keep everything that's less than 30 days old AND keep the latest version for every pacticipant/tag AND keep all the production versions".

I'm a bit hesitant of the actual meaning of "AND" in that statement 😅. Is it a logical AND or is it a logical OR instead? If it was a logical AND, it would mean that the only things kept would be "the latest production versions less than 30 days old" (all the selectors should be true, logical and behaviour). A logical OR, on the other hand, would mean that a tag is kept if it's 30 days old or the latest tag or a production tag (any of those selectors is true, logical or behaviour). What is the truth?

victorlopezsalazar avatar Jan 08 '21 09:01 victorlopezsalazar

You make an excellent point! It's a logical OR.

bethesque avatar Jan 11 '21 22:01 bethesque

Double check the env vars in the source if you're trying it out, as I think one of them changed since I gave the initial example.

I was in the middle of writing the docs during my last OSS day, but didn't get them done in time to finish. The whole thing gets complicated if there is more than one broker application instance running at a time. I'm going to have to make an endpoint that can be hit on a regular basis to trigger the clean job, otherwise, all the instances will start cleaning at once.

bethesque avatar Jan 11 '21 22:01 bethesque

Thanks for pointing that out @bethesque! It seems the docs references to the wrong variable:PACT_BROKER_DATABASE_CLEAN_ENABLED, whilst in the Dockerfile still looks for PACT_BROKER_DATABASE_BETA_ CLEAN_ENABLED.

victorlopezsalazar avatar Jan 12 '21 08:01 victorlopezsalazar

I'll get those docs updated next OSS day @victorlopezsalazar. It got complicated trying to work out what to do in an environment where there are multiple nodes.

bethesque avatar Jan 21 '21 22:01 bethesque

Leaving this here for my future reference https://www.endpoint.com/blog/2020/06/30/postgresql-improve-group-by-max-performance

bethesque avatar Jan 21 '21 22:01 bethesque