Request for Database Retention Policy to Manage Exponentially Growing Data Size
Search before asking
- [X] I had searched in the issues and found no similar feature requirement.
Use case
Description:
We have observed that the data size in the database is growing exponentially. This raises concerns about database performance, storage costs, and scalability.
Proposal: To manage this, we propose applying a data retention policy. The retention policy would involve automatically deleting old data after a specified time period. This policy could help:
- Optimize database performance
- Reduce storage costs
- Ensure long-term scalability
Request: Please evaluate the following:
- Do we currently have any retention policies in place?
- If not, could we implement a retention policy to delete data older than [X months/years]?
- What would be the impact on performance and system integrity?
Description
No response
Related issues
No response
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Hey @naveenmanukonda! 👋 I'm here to help you with any bugs, questions, or becoming a contributor. Let's tackle this issue together!
Currently, there are no existing database retention policies implemented in the codebase [1].
Data in the system is managed by deleting existing data and then inserting new data, which ensures that the latest data is always available. However, this approach can lead to rapid increases in MySQL disk consumption. To manage this, old binary logs can be purged manually or automated to manage disk consumption [1][2][3].
For implementing a retention policy to delete data older than a specified period, you would need to:
- Define the Retention Period: Determine the time period after which data should be deleted (e.g., X months/years).
- Automate Data Deletion: Implement scripts or scheduled jobs (e.g., cron jobs) to periodically delete data older than the defined retention period.
- Evaluate Impact: Assess the impact on performance and system integrity, ensuring that critical data is not lost and that the system remains stable.
If you are willing to submit a PR, you could start by creating a script that identifies and deletes old data based on the defined retention period. Additionally, consider adding configuration options to allow users to customize the retention period according to their needs.
To continue talking to Dosu, mention @dosu.
@dosu what would be the size of binary logs how could we purge them
The size of binary logs can rapidly increase due to the large size of the binary logs generated after each data collection cycle in DevLake.
To purge old binary logs, follow these steps:
-
Connect to your MySQL server using the MySQL client or any other database management tool such as PhpMyAdmin, MySQL Workbench, etc.
-
Check the current status of your binary log files:
SHOW BINARY LOGS;This will display a list of all the binary log files that are currently available on your MySQL server.
-
Determine the last binary log file that you want to keep. This is the file that you want to retain for any future point-in-time recovery or replication purposes.
-
Purge all binary logs that are older than the binary log file that you want to retain:
PURGE BINARY LOGS BEFORE 'DATE';Replace
DATEwith the specific date and time up to which you want to purge the binary logs, formatted as 'YYYY-MM-DD hh:mm:ss'. For example:PURGE BINARY LOGS BEFORE '2023-03-22 15:30:00'; -
Verify the purge was successful by running the
SHOW BINARY LOGS;command again.
Automating the Purge of Old Binary Logs
-
Set the
expire_logs_daysglobal variable to the number of days that you want to keep binary logs for:SET GLOBAL expire_logs_days = 1;
Skipping Binary Logs
-
Set the
skip-log-binconfiguration option directly in thedocker-compose.yamlfile:services: mysql: image: mysql:8 volumes: - mysql-storage:/var/lib/mysql restart: always ports: - "127.0.0.1:3306:3306" environment: MYSQL_ROOT_PASSWORD: admin MYSQL_DATABASE: lake MYSQL_USER: merico MYSQL_PASSWORD: merico command: --character-set-server=utf8mb4 --collation-server=utf8mb4_bin --skip-log-bin -
Restart the MySQL container:
docker-compose restart mysql
These steps will help you manage the size of binary logs effectively [1][2][3].
To continue talking to Dosu, mention @dosu.
@dosu binary file sizes are harly in MB, does this deletion will help in creating free space in database?
No, deleting binary files that are only a few MB in size will not significantly help create free space in the database. The rapid increase in MySQL disk consumption is primarily caused by the large size of the binary logs generated after each data collection cycle. To manage disk space effectively, you should consider purging old binary logs or setting up automated purging.
For example, you can purge old binary logs with the following command:
PURGE BINARY LOGS BEFORE '2023-03-22 15:30:00';
Or set up automated purging by configuring the expire_logs_days global variable:
SET GLOBAL expire_logs_days = 1;
This approach will help manage the disk space more effectively [1].
To continue talking to Dosu, mention @dosu.
This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.