incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

Request for Database Retention Policy to Manage Exponentially Growing Data Size

Open naveenmanukonda opened this issue 1 year ago • 5 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Use case

Description:

We have observed that the data size in the database is growing exponentially. This raises concerns about database performance, storage costs, and scalability.

Proposal: To manage this, we propose applying a data retention policy. The retention policy would involve automatically deleting old data after a specified time period. This policy could help:

  • Optimize database performance
  • Reduce storage costs
  • Ensure long-term scalability

Request: Please evaluate the following:

  • Do we currently have any retention policies in place?
  • If not, could we implement a retention policy to delete data older than [X months/years]?
  • What would be the impact on performance and system integrity?

Description

No response

Related issues

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

naveenmanukonda avatar Sep 13 '24 03:09 naveenmanukonda

Hey @naveenmanukonda! 👋 I'm here to help you with any bugs, questions, or becoming a contributor. Let's tackle this issue together!

Currently, there are no existing database retention policies implemented in the codebase [1].

Data in the system is managed by deleting existing data and then inserting new data, which ensures that the latest data is always available. However, this approach can lead to rapid increases in MySQL disk consumption. To manage this, old binary logs can be purged manually or automated to manage disk consumption [1][2][3].

For implementing a retention policy to delete data older than a specified period, you would need to:

  1. Define the Retention Period: Determine the time period after which data should be deleted (e.g., X months/years).
  2. Automate Data Deletion: Implement scripts or scheduled jobs (e.g., cron jobs) to periodically delete data older than the defined retention period.
  3. Evaluate Impact: Assess the impact on performance and system integrity, ensuring that critical data is not lost and that the system remains stable.

If you are willing to submit a PR, you could start by creating a script that identifies and deletes old data based on the defined retention period. Additionally, consider adding configuration options to allow users to customize the retention period according to their needs.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Sep 13 '24 03:09 dosubot[bot]

@dosu what would be the size of binary logs how could we purge them

naveenmanukonda avatar Sep 13 '24 03:09 naveenmanukonda

The size of binary logs can rapidly increase due to the large size of the binary logs generated after each data collection cycle in DevLake.

To purge old binary logs, follow these steps:

  1. Connect to your MySQL server using the MySQL client or any other database management tool such as PhpMyAdmin, MySQL Workbench, etc.

  2. Check the current status of your binary log files:

    SHOW BINARY LOGS;
    

    This will display a list of all the binary log files that are currently available on your MySQL server.

  3. Determine the last binary log file that you want to keep. This is the file that you want to retain for any future point-in-time recovery or replication purposes.

  4. Purge all binary logs that are older than the binary log file that you want to retain:

    PURGE BINARY LOGS BEFORE 'DATE';
    

    Replace DATE with the specific date and time up to which you want to purge the binary logs, formatted as 'YYYY-MM-DD hh:mm:ss'. For example:

    PURGE BINARY LOGS BEFORE '2023-03-22 15:30:00';
    
  5. Verify the purge was successful by running the SHOW BINARY LOGS; command again.

Automating the Purge of Old Binary Logs

  1. Set the expire_logs_days global variable to the number of days that you want to keep binary logs for:
    SET GLOBAL expire_logs_days = 1;
    

Skipping Binary Logs

  1. Set the skip-log-bin configuration option directly in the docker-compose.yaml file:

    services:
      mysql:
        image: mysql:8
        volumes:
          - mysql-storage:/var/lib/mysql
        restart: always
        ports:
          - "127.0.0.1:3306:3306"
        environment:
          MYSQL_ROOT_PASSWORD: admin
          MYSQL_DATABASE: lake
          MYSQL_USER: merico
          MYSQL_PASSWORD: merico
        command:
          --character-set-server=utf8mb4
          --collation-server=utf8mb4_bin
          --skip-log-bin
    
  2. Restart the MySQL container:

    docker-compose restart mysql
    

These steps will help you manage the size of binary logs effectively [1][2][3].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Sep 13 '24 04:09 dosubot[bot]

@dosu binary file sizes are harly in MB, does this deletion will help in creating free space in database?

naveenmanukonda avatar Sep 13 '24 04:09 naveenmanukonda

No, deleting binary files that are only a few MB in size will not significantly help create free space in the database. The rapid increase in MySQL disk consumption is primarily caused by the large size of the binary logs generated after each data collection cycle. To manage disk space effectively, you should consider purging old binary logs or setting up automated purging.

For example, you can purge old binary logs with the following command:

PURGE BINARY LOGS BEFORE '2023-03-22 15:30:00';

Or set up automated purging by configuring the expire_logs_days global variable:

SET GLOBAL expire_logs_days = 1;

This approach will help manage the disk space more effectively [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Sep 13 '24 04:09 dosubot[bot]

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Nov 13 '24 00:11 github-actions[bot]

This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.

github-actions[bot] avatar Nov 20 '24 00:11 github-actions[bot]