bcachefs icon indicating copy to clipboard operation
bcachefs copied to clipboard

Support lazy-best-effort replication to maximize performance while balancing durability?

Open SUPERCILEX opened this issue 3 years ago • 0 comments

Coming off of this discussion: https://www.reddit.com/r/bcachefs/comments/ualzfn/replica_settings_per_group/

Ideally, it would be possible for writes to get the maximum performance possible (so writing to just one disk) while replicating across multiple disks lazily, on a best-effort basis.

Example configuration 1:

  • Replicas=2
  • 1 SSD + 1 HDD
  • Writes go to the SSD and return once complete
  • In the background, data is replicated to the HDD
    • The period of time between a write to the SSD and copy to the HDD should be minimized to reduce the possibility of data loss. Of course it can never be zero in this scheme, so data loss is possible, but that's an acceptable compromise.
  • The HDD can be yanked and things will still work. Once the HDD is restored, data that hasn't been replicated yet is copied over.

Example configuration 2:

  • Replicas=3
  • 2 SSDs + 3 HDDs
  • Writes go to one of the SSDs. Under load, this means you get double the performance since you multiplex between the SSDs (assuming PCIe bandwidth isn't the limiting factor).
  • In the background, data is replicated either to the other SSD and an HDD or two HDDs. Here the SSDs actually act like a cache because all the replicas can end up on the HDDs if needed, leaving the SSDs to focus on handling active writes.
  • Any two devices can be yanked and things will keep working.

In both examples, having fewer than the target number of replicas is A-Ok because we're operating on a best-effort basis. Ideally, the time in which data lives without the requisite number of replicas is minimized, but no guarantees are made. For example, if the disks run out of space, replicas could start being deleted to support continued operation. The FS is free to make this optimization because replication is made on a best-effort basis.

To be clear, replication can only be offered on a best-effort basis when being lazy (hence why they come together). Since replication is lazy, it is impossible to guarantee that by the time you try to replicate data, the disk you're trying to replicate to is still operational.

SUPERCILEX avatar Apr 24 '22 22:04 SUPERCILEX