python-btrfs
python-btrfs copied to clipboard
btrfs-balance-least-used: sort block groups by bytes used, not percentage used
This way, we can quickly clean up completely empty block groups.
Additionally, if btrfs-balance-least-used -u 0
is invoked, only
consider fully empty block groups because this seems a useful edge case
(there can be a lot of almost-empty block groups which are technically
used_pct==0 but still take a lot of time to balance, so the user may as
well invoke with -u 1
).
Hi! That's interesting.
Do I understand correctly that:
- the actual goal of the change is to have
-u 0
not act on block groups that are not completely empty - using bytes instead of pct is a means to accomplish that?
This question because the title of the whole merge request is "sort block groups by bytes used, not percentage".
Ok, I think I figured out what happens here.
So, the current code for used_pct
in BlockGroupItem
is:
@property
def used_pct(self):
"""Convenience property that calculates the percentage of usage."""
return int(round((self.used * 100) / self.length))
(side note: the int()
is redundant, since Python 3 round()
already returns int
, and can be dropped now.)
The round()
causes usage 0.49% to display as 0 and 99.51% as 100. I don't know why I did this, it has been like that from the very beginning. The number was also meant as a convenience thing for quickly displaying something, not for providing a very accurate number to further act on.
When writing b-b-l-u
much later, I didn't really think about this. So, yes, everything up to 0.49999..% usage is mixed with 0% in the same step and all of them display as 0%.
However, the suggested improvement to use actual bytes used also has a problem, because block groups can have different sizes. We'd rather first reclaim the unused space in a 10GiB RAID0 bg with 1023MiB used (~10%), than a 1GiB RAID0 bg with 1023MiB used (~100%).
If I try to think as a new user who uses b-b-l-u
for the first time, and sees the -u
option and percentage numbers flying by... Then I would kind of intuitively implicitly think it would work like this:
- 0% is really empty, zero bytes used
- 100% is really full, all bytes used
- 1% means just a little bit of stuff, but not completely empty
- 99% means almost full, but not totally
- for values closer to the middle: ok, whatever, there's stuff inside
Does this make sense?
So, neither round()
nor floor()
or ceil()
do the trick, but we could do something yolo by making 50% a bit magnetic, which results in the intuitively expected behaviour...
#!/usr/bin/python3
import math
SZ_1M = 0x00100000
SZ_1G = 0x40000000
def bla(used, length, fn):
float_pct = (used * 100) / length
print("{} {} {}".format(used, float_pct, fn(float_pct)))
def doit(fn):
bla(0, SZ_1G, fn)
bla(SZ_1M, SZ_1G, fn)
bla(8 * SZ_1M, SZ_1G, fn)
bla(1019 * SZ_1M, SZ_1G, fn)
bla(1024 * SZ_1M, SZ_1G, fn)
print("h'okay, so, what a 1GiB BlockGroupItem used_pct does now is:")
doit(round)
print()
def yolo(float_pct):
if float_pct < 50:
return math.ceil(float_pct)
return math.floor(float_pct)
print("why not this:")
doit(yolo)
output:
h'okay, so, what a 1GiB BlockGroupItem used_pct does now is:
0 0.0 0
1048576 0.09765625 0
8388608 0.78125 1
1068498944 99.51171875 100
1073741824 100.0 100
why not this:
0 0.0 0
1048576 0.09765625 1
8388608 0.78125 1
1068498944 99.51171875 99
1073741824 100.0 100
@intelfx What do you think of the above suggestion to always make 0% really 0% and 100% really exactly 100%?