ci-tools icon indicating copy to clipboard operation
ci-tools copied to clipboard

`pod-scaler`: Begin to store data in v2 version including timestamp

Open smg247 opened this issue 1 year ago • 6 comments

For DPTP-4069, this PR represents step 1 below:

Upon further research it has been discovered that the memory usage in the consumers has scaled fairly linearly with the overall amount of data that is stored in the GCS buckets. Despite individual datum being pruned to 25 entries, we have still seen significant increase in the amount of data we have stored. As of today, it is well over 3GB of total data stored. This is due to loading and storing usage data for potentially stale identifiers since the inception of this tool (2021). The approach to fixing this problem involves pruning the stale data, which should result in the consumers using significantly less memory. Unfortunately, there is no way to tell which data is stale, and which is newly generated. Due to this, we will have to begin storing new data with timestamps included. Eventually, we will prune data containing timestamps older than a configurable age (beginning at 180 days). The following phases will be taken to migrate from the existing data format to the new format with pruning:

  1. Begin to store data in the new format in a new bucket "origin-ci-resource-usage-data-v2", as well as continuing to store data using the existing format in the existing bucket "origin-ci-resource-usage-data"
  2. After some time (~30 days) the consumers will be migrated to use the v2 format
  3. Upon verifying that the consumers function as designed using the v2 format, the v1 logic and data will be deleted, the code will be simplified by only using v2 logic
  4. Add pruning logic to prune data older than the configured time
  5. If necessary, plans for using a persistent datastore can be made and executed, but I believe that the prior steps will make this unnecessary

In order to achieve this functionality I have created v2 producer logic and data types based on v1. This was largely a copy/paste job and then minor changes to the types and logic. During step 3 the v1 logic and types will be removed, and the prior module structure will be returned.

Note to reviewer(s), the 2nd and 3rd commits will be the most useful to review here.

smg247 avatar Jul 24 '24 20:07 smg247

/hold as I will have to coordinate a small update to the producer deployment with this

smg247 avatar Jul 24 '24 20:07 smg247

/test e2e

smg247 avatar Jul 24 '24 20:07 smg247

/test e2e

smg247 avatar Jul 24 '24 21:07 smg247

/test e2e

smg247 avatar Jul 25 '24 00:07 smg247

@smg247: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security 3c150d7ce1931257cb8558a7448cd69aaa13b711 link false /test security

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Jul 25 '24 02:07 openshift-ci[bot]

/lgtm

bear-redhat avatar Aug 06 '24 16:08 bear-redhat

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bear-redhat, smg247

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [bear-redhat,smg247]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Aug 06 '24 16:08 openshift-ci[bot]

/hold cancel

smg247 avatar Aug 06 '24 17:08 smg247