drbd
drbd copied to clipboard
Full resync always stuck in congested (behind) state after few days
Hi,
DRBD resource always stuck in Behind state and sync status start decreasing 98.20 -> 98.19 ... 98.12% after 2~3 days when on-congestion policy is "pull-ahead" there is no entry about congestion fill/extents reached in kernel logs as when you hit the configured limit.
I tried to increase congestion-fill to crazy value (100M -> 200M,500M or disable 0) and congestion-extents (to value even higher than al-extents) or commented them completely out from configuration but no help still same outcome.
Commenting out on-congestion pull-ahead (switch to default block) will help and resync started continuing again.
When congested logs on primary are filling with thousand same entries in loop: [ +0.104537] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead ) [ +1.026862] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source [ +0.002791] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0 [ +0.001076] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS ) [ +0.000718] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]). [ +0.043059] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead ) [ +1.040371] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source [ +0.004944] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0 [ +0.000894] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS ) [ +0.000695] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]). [ +0.098964] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead ) [ +1.046465] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source [ +0.003419] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0 [ +0.004561] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS ) [ +0.010005] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]). [ +0.046983] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead ) [ +1.022996] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source [ +0.006174] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0 [ +0.011331] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS ) [ +0.009500] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]). [ +0.264396] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead ) [ +1.052604] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source [ +0.004883] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0 [ +0.008559] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS ) [ +0.008532] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).
ENV: DRBD 9.1.7 Oracle Linux 8.6(lattest updates, but same with few months old ackages) Full configuration in attachment, congestion is only configured for backup storage(backup-dc) node because is way slower. storage.txt
Zero progress was actually because of dynamic sync-rate controller, after switching off congestion control resync start but never finished either. After setting fixed rate "c-plan-ahead 0 and "resync-rate 50M" everything works as expected, fixed rate is still better than no rate at all.