core
core copied to clipboard
DropOldContentVersionsJob not running on large dataset
Parent Issue
https://github.com/dotCMS/core/issues/26188
Problem Statement
The DropOldContentVersionsJob is not running successfully on large dataset. It starts but never gets finish due some slow query issue:
2024-09-30 22:00:00.151
04:00:00.150 INFO job.DropOldContentVersionsJob - --------------------------------------
2024-09-30 22:00:00.151
04:00:00.150 INFO job.DropOldContentVersionsJob - DropOldContentVersionsJob has started
2024-09-30 22:00:00.151
04:00:00.150 INFO job.DropOldContentVersionsJob - -> Deleting Contentlets older than 365 days...
04:05:10.895 WARN db.DotConnect - Somewhat slow query, total time: 302812.75ms, query preparation time: 0.022711ms, query execution time: 302812.7ms, metadata time: 5.0E-5ms, SQL: SELECT DISTINCT inode FROM contentlet WHERE identifier <> 'SYSTEM_HOST' AND mod_date < ? AND inode NOT IN (SELECT working_inode FROM contentlet_version_info WHERE working_inode = contentlet.inode) AND inode NOT IN (SELECT live_inode FROM contentlet_version_info WHERE live_inode = contentlet.inode), parameters: [2023-10-01 00:00:00.0]
After this nothing else is logged.
Steps to Reproduce
Run the DropOldContentVersionsJob on a customer with a large dataset.
Acceptance Criteria
DropOldContentVersionsJob should run successfully
dotCMS Version
23.10+
Proposed Objective
Customer Support
Proposed Priority
Priority 2 - Important
External Links... Slack Conversations, Support Tickets, Figma Designs, etc.
No response
Assumptions & Initiation Needs
We need to optimize this query https://github.com/dotCMS/core/blob/main/dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ESContentFactoryImpl.java#L715-L719 probably pull by batches.
Quality Assurance Notes & Workarounds
No response
Sub-Tasks & Estimates
No response