python-btrfs icon indicating copy to clipboard operation
python-btrfs copied to clipboard

btrfs-balance-least-used: sort block groups by bytes used, not percentage used

Open intelfx opened this issue 3 years ago • 3 comments

This way, we can quickly clean up completely empty block groups.

Additionally, if btrfs-balance-least-used -u 0 is invoked, only consider fully empty block groups because this seems a useful edge case (there can be a lot of almost-empty block groups which are technically used_pct==0 but still take a lot of time to balance, so the user may as well invoke with -u 1).

intelfx avatar Sep 05 '21 08:09 intelfx

Hi! That's interesting.

Do I understand correctly that:

  1. the actual goal of the change is to have -u 0 not act on block groups that are not completely empty
  2. using bytes instead of pct is a means to accomplish that?

This question because the title of the whole merge request is "sort block groups by bytes used, not percentage".

knorrie avatar Nov 20 '21 20:11 knorrie

Ok, I think I figured out what happens here.

So, the current code for used_pct in BlockGroupItem is:

@property
def used_pct(self):
    """Convenience property that calculates the percentage of usage."""
    return int(round((self.used * 100) / self.length))

(side note: the int() is redundant, since Python 3 round() already returns int, and can be dropped now.)

The round() causes usage 0.49% to display as 0 and 99.51% as 100. I don't know why I did this, it has been like that from the very beginning. The number was also meant as a convenience thing for quickly displaying something, not for providing a very accurate number to further act on.

When writing b-b-l-u much later, I didn't really think about this. So, yes, everything up to 0.49999..% usage is mixed with 0% in the same step and all of them display as 0%.

However, the suggested improvement to use actual bytes used also has a problem, because block groups can have different sizes. We'd rather first reclaim the unused space in a 10GiB RAID0 bg with 1023MiB used (~10%), than a 1GiB RAID0 bg with 1023MiB used (~100%).

If I try to think as a new user who uses b-b-l-u for the first time, and sees the -u option and percentage numbers flying by... Then I would kind of intuitively implicitly think it would work like this:

  • 0% is really empty, zero bytes used
  • 100% is really full, all bytes used
  • 1% means just a little bit of stuff, but not completely empty
  • 99% means almost full, but not totally
  • for values closer to the middle: ok, whatever, there's stuff inside

Does this make sense?

So, neither round() nor floor() or ceil() do the trick, but we could do something yolo by making 50% a bit magnetic, which results in the intuitively expected behaviour...

#!/usr/bin/python3

import math

SZ_1M = 0x00100000
SZ_1G = 0x40000000


def bla(used, length, fn):
    float_pct = (used * 100) / length
    print("{} {} {}".format(used, float_pct, fn(float_pct)))


def doit(fn):
    bla(0, SZ_1G, fn)
    bla(SZ_1M, SZ_1G, fn)
    bla(8 * SZ_1M, SZ_1G, fn)
    bla(1019 * SZ_1M, SZ_1G, fn)
    bla(1024 * SZ_1M, SZ_1G, fn)


print("h'okay, so, what a 1GiB BlockGroupItem used_pct does now is:")
doit(round)
print()


def yolo(float_pct):
    if float_pct < 50:
        return math.ceil(float_pct)
    return math.floor(float_pct)


print("why not this:")
doit(yolo)

output:

h'okay, so, what a 1GiB BlockGroupItem used_pct does now is:
0 0.0 0
1048576 0.09765625 0
8388608 0.78125 1
1068498944 99.51171875 100
1073741824 100.0 100

why not this:
0 0.0 0
1048576 0.09765625 1
8388608 0.78125 1
1068498944 99.51171875 99
1073741824 100.0 100

knorrie avatar Nov 20 '21 22:11 knorrie

@intelfx What do you think of the above suggestion to always make 0% really 0% and 100% really exactly 100%?

knorrie avatar Dec 17 '21 18:12 knorrie