sheepdog icon indicating copy to clipboard operation
sheepdog copied to clipboard

Object loss after recovery cancelled on hetero-disk, auto-vnodes and avoiding-diskfull cluster

Open tmenjo opened this issue 8 years ago • 2 comments

Reproducibility

100%.

I used Sheepdog 1.0_88_g978eeae, but other version(s) may reproduce this.

Steps to reproduce

  1. Set up a 2-replica cluster with 4 nodes (say N0-N3) like below:
    • hetero-disk: each of N0-N2 has a 128-MiB disk and N3 has a 256-MiB one
    • auto-vnodes (sheep and dog cluster format without -V)
    • enable avoiding-diskfull option (dog format cluster with -F)
  2. Create a 256-MiB thin-provisioned VDI alpha
  3. Do dog vdi write alpha to fill it some non-zero data
  4. Do dog node kill N3
  5. Wait until recovery cancelled because of diskfull
  6. Do dog vdi read alpha

Expected behavior

At step 6, I can read the same data as the written at step 2. Because I use 2-replica cluster, up to 1 node failure should be torelable.

Actual behavior

At step 6, I cannot read the same data as the written at step 2.

I think this is real object loss. The dog vdi read said that "Failed to read object 00ed202b00000026 No object found". I can find that object just after step 3, but cannot after step 5 on any alive nodes i.e. N0-N2, even in .stale.

Full reproduction script and running log

tmenjo avatar Feb 16 '17 08:02 tmenjo

Can this problem be reproduced with the fixed vnode mode? I believe the auto vnode is completely flawed stuff so should be removed in the future. If this problem is auto vnode specific, its priority wouldn't be high.

mitake avatar Feb 17 '17 02:02 mitake

No. This is not reproduced on fixed-vnodes cluster. As you mentioned, this seems auto-vnode specific problem.

I agree that auto-vnode feature should be removed. At least, auto-vnode should be calculated node-local disk space, not cluster-wide. For example, give 1 vnodes for 1-GiB disk space; if a node have a 100-GiB disk, set its vnode to 100.

tmenjo avatar Feb 17 '17 06:02 tmenjo