Daemons: reaper, avoid multiple reaper workers working on the same replicas; rucio#6512
Fixes https://github.com/rucio/rucio/issues/6512
Overview
This branch introduces two complementary mechanisms to reduce the likelihood of multiple reaper workers working on the same replicas that are about to be deleted.
The two mechanisms are:
- A) Immediate cleaning of replicas from the Rucio DB (configurable)
- B) Refreshing replicas to be deleted (always enabled)
A) Immediate cleaning of replicas
Existing Mode (Default)
Configuration: enable_immediate_cleanup = false (default, can be omitted)
[reaper]
# Traditional mode - no additional configuration needed
# enable_immediate_cleanup = false # Default value, can be omitted
delay_seconds = 600 # Standard replica selection delay
chunk_size = 100 # Number of replicas to process per batch
Behavior:
- Database cleanup happens once after all physical deletions (hundreds to thousands) complete
- Maintains the original behavior
Immediate Cleanup Mode (Opt-in)
Configuration: enable_immediate_cleanup = true
[reaper]
enable_immediate_cleanup = true # Enable immediate cleanup optimization
db_batch_size = 50 # Batch size for immediate database cleanup (default: 50)
refresh_trigger_ratio = 80 # Percentage of delay_seconds before refreshing (default: 80)
delay_seconds = 600 # Standard replica selection delay
chunk_size = 100 # Number of replicas to process per batch
Behavior:
- Database cleanup happens in configurable batches during physical deletion
- Increases database load (more statements executed)
- Deletions are faster (removed from Rucio DB), therefore making them visible to external scripts.
Configuration Parameters
| Parameter | Default | Description |
|---|---|---|
enable_immediate_cleanup |
false |
Enable/disable immediate database cleanup optimization |
db_batch_size |
50 |
Number of replicas to clean from database in each immediate batch |
refresh_trigger_ratio |
80 |
Percentage of delay_seconds after which to refresh remaining replicas (applies to both traditional and immediate cleanup modes) |
delay_seconds |
600 |
Standard delay for replica selection (existing parameter) |
chunk_size |
100 |
Number of replicas to process per iteration (existing parameter) |
B) Replica Refresh Control
Always enabled.
The reaper uses a delay_seconds mechanism to prevent multiple workers from processing the same replicas. When replicas are marked as BEING_DELETED, other workers will not select them until delay_seconds have passed since their last update.
To prevent race conditions when processing takes longer than expected, the reaper can refresh the updated_at timestamp of remaining replicas:
[reaper]
delay_seconds = 600 # Replicas become selectable by other workers after 10 minutes
refresh_trigger_ratio = 80 # Refresh remaining replicas after 80% of delay_seconds (8 minutes)
How it works:
- Worker starts processing 100 replicas at time T=0
- At T=8 minutes (80% of 10 minutes), if replicas are still being processed:
- Worker calls
refresh_replicas()on remaining unprocessed replicas - This updates their
updated_attimestamp to current time - Other workers will wait another 10 minutes before selecting these replicas
- Worker calls
- Original worker continues processing without interference
Refresh Configuration Examples
Conservative (longer processing time allowed):
[reaper]
delay_seconds = 900 # 15 minutes before other workers can take over
refresh_trigger_ratio = 90 # Refresh after 13.5 minutes
Aggressive (faster worker coordination):
[reaper]
delay_seconds = 300 # 5 minutes before other workers can take over
refresh_trigger_ratio = 70 # Refresh after 3.5 minutes
Multi-worker environment (balanced):
[reaper]
delay_seconds = 600 # Standard 10 minutes
refresh_trigger_ratio = 75 # Refresh after 7.5 minutes (leaves 2.5min buffer)
Performance Tuning Examples
High-Throughput Environment
Optimize for maximum performance with frequent immediate cleanups:
[reaper]
enable_immediate_cleanup = true
db_batch_size = 25 # Smaller batches for more frequent cleanup
refresh_trigger_ratio = 70 # Refresh remaining replicas earlier
delay_seconds = 300 # Shorter delay for faster processing
chunk_size = 200 # Larger chunks for higher throughput
Conservative Environment
Optimize for reliability with larger batches:
[reaper]
enable_immediate_cleanup = true
db_batch_size = 100 # Larger batches, fewer database calls
refresh_trigger_ratio = 90 # Wait longer before refreshing
delay_seconds = 900 # Longer delay for stability
chunk_size = 50 # Smaller chunks for reliability
Multi-Worker Environment
Optimize for coordination between multiple reaper workers:
[reaper]
enable_immediate_cleanup = true
db_batch_size = 30 # Moderate batch size
refresh_trigger_ratio = 75 # Refresh before other workers can interfere
delay_seconds = 600 # Standard delay
chunk_size = 100 # Standard chunk size
Monitoring and Debugging
Log Messages
Traditional Mode:
DEBUG: Deletion complete for RSE CERN-PROD - processed 150 replicas, all 150 will be cleaned up by main loop (traditional mode)
DEBUG: Main loop cleanup SUCCESS - deleted 150 remaining replicas in 2.34 seconds
Immediate Cleanup Mode:
DEBUG: Starting deletion for RSE CERN-PROD with 150 replicas, enable_immediate_cleanup=True, db_batch_size=50, delay_seconds=600
DEBUG: Immediate cleanup SUCCESS: deleted 50 replicas from database (batch #1)
DEBUG: Immediate cleanup SUCCESS: deleted 50 replicas from database (batch #2)
DEBUG: Final cleanup SUCCESS: deleted 50 remaining replicas from database
DEBUG: Deletion complete for RSE CERN-PROD - processed 150 replicas, performed 3 immediate cleanups, total immediate cleaned: 150, remaining for main loop: 0
Replica Refresh Messages:
DEBUG: Refresh trigger time set to 480.0 seconds (80% of delay_seconds=600)
DEBUG: Refresh triggered after 485.2 seconds - refreshing 45 remaining replicas (out of 100 total)
DEBUG: Successfully refreshed 45 remaining replicas after 485.2 seconds
WARNING: Failed to bump updated_at for remaining replicas BEING_DELETED
Configuration Verification
Check active configuration at startup:
DEBUG: Optimization configuration - enable_immediate_cleanup=True, db_batch_size=50, refresh_trigger_ratio=80%, delay_seconds=600, chunk_size=100, total_workers=4
Replica Refresh Function
The refresh_replicas() function in rucio.core.replica provides the underlying mechanism:
from rucio.core.replica import refresh_replicas
# Update the updated_at timestamp for replicas to prevent other workers from taking them
success = refresh_replicas(
rse_id='CERN-PROD_DATADISK',
replicas=[
{'scope': 'cms', 'name': 'file1.root'},
{'scope': 'cms', 'name': 'file2.root'}
]
)
Troubleshooting
Common Issues
Workers taking over each other's work:
# Solution: Reduce refresh trigger ratio or increase delay
[reaper]
delay_seconds = 900 # Increase to 15 minutes
refresh_trigger_ratio = 70 # Refresh after 70% (10.5 minutes)
Database performance issues with immediate cleanup:
# Solution: Increase batch size to reduce DB calls
[reaper]
enable_immediate_cleanup = true
db_batch_size = 100 # Larger batches, fewer DB operations
Slow processing causing timeouts:
# Solution: Increase delay and refresh earlier
[reaper]
delay_seconds = 1200 # 20 minutes total
refresh_trigger_ratio = 60 # Refresh after 12 minutes
Codecov Report
:x: Patch coverage is 0% with 144 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 7.05%. Comparing base (695df6e) to head (a112d52).
:warning: Report is 50 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| lib/rucio/daemons/reaper/reaper.py | 0.00% | 112 Missing :warning: |
| lib/rucio/core/replica.py | 0.00% | 32 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #8048 +/- ##
=========================================
- Coverage 7.13% 7.05% -0.08%
=========================================
Files 272 272
Lines 45763 45874 +111
=========================================
- Hits 3266 3238 -28
- Misses 42497 42636 +139
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@mgajek-cern thank you for the review. I've considered splitting but I decided against it because both mechanisms touch mostly the same files and do the same thing: reduce the likelihood of overlapping work by multiple workers. The changes are done in a way that are backwards compatible with existing deployments.
Nonetheless, I'll follow-up on the other suggestions and the ones from the previous attempt in #7199.
@labkode please squash this now. It shouldn't be reviewed in a wip state, this just makes the work harder for the reviewers, who are already very thinly spread.
@bari12 done.
Once we merge #8269, Rucio's ruff pre-commit checks will be required to pass also on the CI. Right now this is not enforced because we have miss-configured things and so, only when someone has enabled the pre-commit checks locally, things are taken care of. I have flagged couple problems that will arise in the future but to avoid mentioning all of them one by one in the review (as there are some more), maybe you can enable the pre-commit locally to fix the problems related to the files you have modified (note: do not mind about the E501 line too long). But if you don't have time, don't bother (they are just stylistic). Just address the flagged ones here and I can fix everything once we merge #8269. No problem.
@Geogouz thanks for the changes, all good from my side.