incubator-pegasus icon indicating copy to clipboard operation
incubator-pegasus copied to clipboard

share log calculated size is unreasonable or the shared log may be damaged

Open foreverneverer opened this issue 5 years ago • 2 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
  • One replica-server was down, we manually re-added it into the cluster.

  • Run the following commands to add the node:

    • remote-command -t meta-server meta.lb.only_move_primary true
    • set_meta_level lively
  1. What did you expect to see? The node server can be restarted and no any error

  2. What did you see instead?

  • the perfcounter report the shared log too large 25853163(MB) > 50000
  • the log show error as soon as when the node server restart:
mutation_log.cpp:2057:read_next_log_block(): read data block body failed, size = 328 vs 676, err = ERR_HANDLE_EOF
replica_stub.cpp:552:initialize():some shared log state must be lost, smax(1301076891) vs pmax(1301079680)
replica_stub.cpp:565:initialize(): logs are not complete for some replicas, which means that shared log is truncated, mark all replicas as inactive
  1. What version of Pegasus are you using? pegasus-server-1.12.3-a948e89-glibc2.12-release.tar.gz

  2. Suggestion

  • suggest dessart instead of derror if the shared log is damaged when restart the node server
  • cleanup the node and then restart

foreverneverer avatar Jun 16 '20 05:06 foreverneverer

What did you expect to see? The node server can be restarted and no any error.

What did you see instead? the perfcounter report the shared log too large 25853163(MB) > 50000 the log show error as soon as when the node server restart:

So what's the next result of a too-large-shared-log? Did it make the cluster unable to serve anymore? Or was the replica-server unable to restart?

neverchanje avatar Jun 23 '20 05:06 neverchanje

I think we can consider mocking such case in replica's UT, by intentionally append some mutations only to the plog, without appending to the slog. Let the server restart then, and see what happens.

neverchanje avatar Jun 23 '20 06:06 neverchanje