btrfs-progs icon indicating copy to clipboard operation
btrfs-progs copied to clipboard

Add option to btrfs filesystem defragment to not defragment shared extents

Open crass opened this issue 3 years ago • 2 comments

Its a well known issue that running defragment will break up reflink clones. One can google and see this is the source of a lot of headaches for users and I bet will continue to be. I'm hoping that this could be the start of remedying that.

This big idea here is to allow for a sub-optimal defragmentation that does not break up reflinks or shared extents in general. I noticed that the defragment command can take a starting byte offset and a length in bytes arguments. So it seems defragment should be able to (easily?) look at the extents of a given file and only defragment sequential extents that do not contain shared extents. Currently a script could be written to do this by getting the FIEMAP and running a series of defragment operations specifying start and length. It would be better to have this be done internally as an option for defragment. There should be a note in the documentation for this option that its a sub-optimal defragmentation.

crass avatar Oct 01 '22 07:10 crass

The defrag ioctl needs an update to pass the 'none' option and to pass the level. Adding a flag to skip shared extents can also be added, but yes this can be emulated by userspace now. The snapshot-aware defrag has been disabled and removed some time ago, the performance was bad and it had large memory requirements. IIRC we've discussed whether the reflink breaking should be optional and it seems there are users who want that too.

kdave avatar Oct 03 '22 11:10 kdave

While just skipping shared extents would be a nice start it would be amazing if shared extents could be rewritten to a defragmented version. Obviously this will cause issues if the extents are really mixed up such that there is no linear sequence of bytes that could satisfy them all but I expect that in many cases there is basically one large extent shared by many files (possibly in snapshots) with small modifications. In cases like these writing out a large sequential common extent and using it by the various referents would be very useful. (Although I don't know if there is enough metadata available to do this efficently.)

The simplest case of this would be:

  1. Create fragmented file.
  2. Copy using reflinks a handful of times.
  3. Defragment.

It would be great if this could be handled such that in the end all files end up pointing to one defragmented extent.

A slightly more complex case would be:

  1. Create fragmented file.
  2. Copy using reflinks a handful of times.
  3. Make small modifications to some of the copies.
  4. Defragment.

It would be great if the result was similar to as if the original file was defragmented between step 1 and 2.

But I understand if this would need to be a follow-up issue.

kevincox avatar Nov 18 '25 20:11 kevincox