incubator-pegasus icon indicating copy to clipboard operation
incubator-pegasus copied to clipboard

Bug(duplication):some nodes never start GC plog after computer room failure

Open ninsmiracle opened this issue 1 year ago • 2 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? The computer room which service for our duplication master cluster meet an accidents. And most of the node in this room shutdown in a short time. When all the nodes alive , we found some partition of the duplication table never GC private log (plog) again.

  2. What did you expect to see? All the partition can GC it's plog correctly.

  3. What did you see instead? stdout (error log):

// stdout
90146:E2024-05-14 15:59:52.512 (1715673592512665104 67086) replica.default8.040005fe0319646c: nfs_server_impl.cpp:221:on_get_file_size(): {nfs_service} get stat of file /home/work/ssd2/pegasus/alsgsrv-monetization-master/replica/reps/8.53.pegasus/plog/log.18129.608864535790 failed, err = No such file or directory

We can see this replica request a old plog. image

Because the partition can not clear plog as nomarl,so the disk always fully. We need to clear the plog sometimes.

  1. What version of Pegasus are you using? Pegasus v2.4

ninsmiracle avatar May 22 '24 02:05 ninsmiracle

Is the file /home/work/ssd2/pegasus/alsgsrv-monetization-master/replica/reps/8.53.pegasus/plog/log.18129.608864535790 actually exists or not?

acelyc111 avatar Jul 24 '24 06:07 acelyc111

Is the file /home/work/ssd2/pegasus/alsgsrv-monetization-master/replica/reps/8.53.pegasus/plog/log.18129.608864535790 actually exists or not?

When coredump happened, the file actually not exists.

ninsmiracle avatar Jul 24 '24 08:07 ninsmiracle