thin-provisioning-tools icon indicating copy to clipboard operation
thin-provisioning-tools copied to clipboard

RFE: Offline deduplication

Open DemiMarie opened this issue 3 years ago • 9 comments

There are cases where it would be quite useful to be able to compact a thin pool by deduplicating identical blocks while the system is offline.

DemiMarie avatar Apr 03 '22 19:04 DemiMarie

Not quite the same thing, but look at the dm-archive tool I'm currently working on.

On Sun, 3 Apr 2022, 20:34 Demi Marie Obenour, @.***> wrote:

There are cases where it would be quite useful to be able to compact a thin pool by deduplicating identical blocks while the system is offline.

— Reply to this email directly, view it on GitHub https://github.com/jthornber/thin-provisioning-tools/issues/206, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOSQ5YVF3R3RT4G3KLNY3VDHXF3ANCNFSM5SNTWERQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jthornber avatar Apr 03 '22 19:04 jthornber

Could that be used to make a thin_dedup tool?

DemiMarie avatar Apr 03 '22 20:04 DemiMarie

Just for reference, @tasket's wyng backup might be an interesting project to take a glance at (not equivalent but maybe some overlap):

https://github.com/tasket/wyng-backup

B

brendanhoar avatar Apr 03 '22 20:04 brendanhoar

I think Demi is looking for a tool to deduplicate thin volumes in-place. From comments I've read in Linux discussion (and here?) I gathered that this would not be on the thinp roadmap.

OTOH, it seems like a narrowly-targeted form of dedup could be approximated for two target volumes by scanning for differences, snapshotting one volume, then updating it with the mapped differences (and finally replacing the snapshotted original with the snapshot).

tasket avatar Apr 04 '22 03:04 tasket

FWIW, Wyng can facilitate this as part of a restore from an archive (using a sparse write mode to update an existing volume, it will skip over chunks that match). But that means performing a backup first.

tasket avatar Apr 04 '22 03:04 tasket

I think Demi is looking for a tool to deduplicate thin volumes in-place.

That’s correct. My goal is to be able to reclaim shared space on a Qubes OS system.

DemiMarie avatar Apr 04 '22 03:04 DemiMarie

A particular use case for Qubes: when backing up and then restoring thin LVs that were snapshots (e.g. of cloned QubesOS VMs), using most methods, one usually ends up with much more space used up after the restore than before, because while the originally pool had much more sharing of blocks, after all of the LVs are restored to a new thin pool, no blocks are being shared.

B

brendanhoar avatar Apr 25 '22 21:04 brendanhoar

A particular use case for Qubes: when backing up and then restoring thin LVs that were snapshots (e.g. of cloned QubesOS VMs), using most methods, one usually ends up with much more space used up after the restore than before, because while the originally pool had much more sharing of blocks, after all of the LVs are restored to a new thin pool, no blocks are being shared.

B

This can actually be disastrous, as it can make backups impossible to restore. Deduplication during restore is necessary to prevent this problem.

DemiMarie avatar Apr 25 '22 21:04 DemiMarie

dm-archive (which I'm going to rename to blk-archive) will check to see if it's restoring to a thin device. If it is, it will read the mappings and read the data, it will then do minimal writes to restore the backup. This is a flexible approach because it allows us to regain sharing between any two related thin devices. eg, the backup might be taken a month ago, and restored to a snapshot of the current head.

On Mon, 25 Apr 2022 at 22:28, Demi Marie Obenour @.***> wrote:

A particular use case for Qubes: when backing up and then restoring thin LVs that were snapshots (e.g. of cloned QubesOS VMs), using most methods, one usually ends up with much more space used up after the restore than before, because while the originally pool had much more sharing of blocks, after all of the LVs are restored to a new thin pool, no blocks are being shared.

B

This can actually be disastrous, as it can make backups impossible to restore. Deduplication during restore is necessary to prevent this problem.

— Reply to this email directly, view it on GitHub https://github.com/jthornber/thin-provisioning-tools/issues/206#issuecomment-1109058254, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOSQ4LX2YCRZXAJIDHAJLVG4E57ANCNFSM5SNTWERQ . You are receiving this because you commented.Message ID: @.***>

jthornber avatar Apr 26 '22 06:04 jthornber