Zeno icon indicating copy to clipboard operation
Zeno copied to clipboard

Disk watcher pausing and resuming over short intervals

Open CorentinB opened this issue 11 months ago • 2 comments

Noticed full hang of Zeno after disk watcher reported over and over "pausing" then "resuming".

2025-02-07T02:10:40+01:00 [WARN] Low disk space, pausing the pipeline   component=controler.diskWatcher err=low disk space: free=53.68 GB, threshold=53.69 GB
2025-02-07T02:24:10+01:00 [INFO] Disk space is sufficient, resuming the pipeline        component=controler.diskWatcher
2025-02-07T02:27:40+01:00 [WARN] Low disk space, pausing the pipeline   component=controler.diskWatcher err=low disk space: free=53.44 GB, threshold=53.69 GB
2025-02-07T02:34:40+01:00 [INFO] Disk space is sufficient, resuming the pipeline        component=controler.diskWatcher
2025-02-07T02:35:50+01:00 [WARN] Low disk space, pausing the pipeline   component=controler.diskWatcher err=low disk space: free=53.57 GB, threshold=53.69 GB
2025-02-07T02:51:45+01:00 [INFO] Disk space is sufficient, resuming the pipeline        component=controler.diskWatcher
2025-02-07T02:52:25+01:00 [WARN] Low disk space, pausing the pipeline   component=controler.diskWatcher err=low disk space: free=53.47 GB, threshold=53.69 GB
2025-02-07T03:13:20+01:00 [INFO] Disk space is sufficient, resuming the pipeline        component=controler.diskWatcher
2025-02-07T03:14:00+01:00 [WARN] Low disk space, pausing the pipeline   component=controler.diskWatcher err=low disk space: free=53.40 GB, threshold=53.69 GB
2025-02-07T03:18:20+01:00 [INFO] Disk space is sufficient, resuming the pipeline        component=controler.diskWatcher
2025-02-07T03:20:10+01:00 [WARN] Low disk space, pausing the pipeline   component=controler.diskWatcher err=low disk space: free=53.46 GB, threshold=53.69 GB
2025-02-07T03:36:00+01:00 [INFO] Disk space is sufficient, resuming the pipeline        component=controler.diskWatcher
2025-02-07T03:37:45+01:00 [WARN] Low disk space, pausing the pipeline   component=controler.diskWatcher err=low disk space: free=53.20 GB, threshold=53.69 GB
2025-02-07T03:48:55+01:00 [INFO] Disk space is sufficient, resuming the pipeline        component=controler.diskWatcher
2025-02-07T03:51:05+01:00 [WARN] Low disk space, pausing the pipeline   component=controler.diskWatcher err=low disk space: free=53.66 GB, threshold=53.69 GB
2025-02-07T03:52:35+01:00 [INFO] Disk space is sufficient, resuming the pipeline        component=controler.diskWatcher
2025-02-07T03:54:25+01:00 [WARN] Low disk space, pausing the pipeline   component=controler.diskWatcher err=low disk space: free=53.55 GB, threshold=53.69 GB
2025-02-07T04:39:30+01:00 [INFO] Disk space is sufficient, resuming the pipeline        component=controler.diskWatcher```

CorentinB avatar Feb 07 '25 15:02 CorentinB

Solution is to implement a backoff timing before resuming so it gives times for the WARC uploader to upload more than a few MB

equals215 avatar Feb 07 '25 15:02 equals215

Will treat later

equals215 avatar Feb 07 '25 15:02 equals215

Fix: https://github.com/internetarchive/Zeno/pull/328

vbanos avatar Jun 02 '25 08:06 vbanos

Another attempt: https://github.com/internetarchive/Zeno/pull/331

vbanos avatar Jun 05 '25 16:06 vbanos