asu icon indicating copy to clipboard operation
asu copied to clipboard

rq: add a garbage collector to the worker

Open efahl opened this issue 8 months ago • 5 comments

Implement a maintenance hook on the standard redis queue worker to do garbage collection on expired builds. When a result expires from the queue, its data will be removed from the public/store/ directory at the regular maintenance interval (default is every 600 seconds).

efahl avatar Apr 20 '25 20:04 efahl

I've been running this for about 9 months now, no issues on my local server.

I test it out by setting the ttl values way down in api.py and watching the logs on the server after I queue up a bunch of builds.

+    failure_ttl: str = "3m"
+    result_ttl: str = "15m"

efahl avatar Apr 20 '25 20:04 efahl

But, are we running out of space due to old podman/docker containers hanging around, too?

Maybe an hourly cron job that does this?

podman container prune --force && podman image prune --force

efahl avatar Apr 20 '25 20:04 efahl

Codecov Report

Attention: Patch coverage is 43.58974% with 22 lines in your changes missing coverage. Please review.

Project coverage is 88.79%. Comparing base (5e65dec) to head (49ba426). Report is 241 commits behind head on main.

Files with missing lines Patch % Lines
asu/rq.py 42.10% 22 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1370      +/-   ##
==========================================
+ Coverage   80.75%   88.79%   +8.03%     
==========================================
  Files          15       15              
  Lines         977     1258     +281     
==========================================
+ Hits          789     1117     +328     
+ Misses        188      141      -47     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 20 '25 20:04 codecov[bot]

Just been playing with the podman API, this might be added to that worker, too. It appears that pruning only affects unused containers/images, so you can safely just call these and reclaim the orphaned space.

$ python -c 'from asu.util import get_podman; print(get_podman().containers.prune())'
{'ContainersDeleted': [], 'SpaceReclaimed': 0}

$ python -c 'from asu.util import get_podman; print(get_podman().images.prune())'
{'ImagesDeleted': [{'Deleted': '4a2e6bf10ec5d4c70ff5b8598193285b6d090d47ad737bfe322c0be298e1e6d1', 'Untagged': ''}], 'SpaceReclaimed': 927715421}

efahl avatar Apr 20 '25 22:04 efahl

Right now this is my crontab, however we could also use the GC worker:

@daily cd /srv && find ./store  -ctime +1 -exec rm -rf {} \;
@daily podman volume prune -af

aparcar avatar Apr 20 '25 22:04 aparcar