zfs libzutil: allow to display powers of 1000 bytes

Motivation and Context

ZFS displays bytes with K/M/G/T/P/E prefixes. They represent powers of 1024 bytes, i.e. KiB, MiB, GiB, TiB, PiB, EiB. Some users may want these prefixes to represent powers of 1000 bytes, i.e. KB, MB, GB, TB, PB, EB.

Description

This adds the new unit format and allows to use such display by defining an environment variable.

How Has This Been Tested?

Not tested. If this draft gathers interest then I will add tests.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Performance enhancement (non-breaking change which improves efficiency)
[ ] Code cleanup (non-breaking change which makes code smaller or more readable)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
[ ] Documentation (a change to man pages or other documentation)

Checklist:

[x] My code follows the OpenZFS code style requirements.
[ ] I have updated the documentation accordingly.
[x] I have read the contributing document.
[ ] I have added tests to cover my changes.
[ ] I have run the ZFS Test Suite with this change applied.
[x] All commit messages are properly formatted and contain Signed-off-by.

Sep 28 '24 13:09 jcassette

This PR works when the value is a byte value, but there are other places where we use ZFS_NICENUM_1024 (perhaps incorrectly) that wont be affected. For example, the JSON code uses ZFS_NICENUM_1024 in an bunch of places: https://github.com/openzfs/zfs/blob/ca0141f325ec706d38a06f9aeb8e5eb6c6a8d09a/cmd/zpool/zpool_main.c#L9177

Oct 08 '24 00:10 tonyhutter

Hello and thanks for the feedback

@mcmilk :

There should also be a testcase for this new formatting.

Sure. Could you direct me to an existing test case that I could use as a base?

@tonyhutter :

This PR works when the value is a byte value, but there are other places where we use ZFS_NICENUM_1024 (perhaps incorrectly) that wont be affected. For example, the JSON code uses ZFS_NICENUM_1024 in an bunch of places:

Sorry I missed that. I can see that ZFS_NICENUM_BYTES is used directly there. This PR does not work in that case currently, and I am not sure how to fix that. If there are places where ZFS_NICENUM_1024 is used to represent bytes, could that be fixed in another PR by using ZFS_NICENUM_BYTES?

https://github.com/openzfs/zfs/blob/ca0141f325ec706d38a06f9aeb8e5eb6c6a8d09a/cmd/zpool/zpool_main.c#L9176-L9177

In this example, it seems weird to express a number of errors in powers of 1024. (I think nobody expects "100K errors" to be 100Ki = 102400 errors). Maybe a new unit like ZFS_NICENUM_1000 should be used instead?

Oct 09 '24 14:10 jcassette

There should also be a testcase for this new formatting.

@jcassette - Sorry, there is currently no such test case, I thought I had seen such test. So no - it seems that this is not needed. Sorry for the noise by me.

Oct 09 '24 18:10 mcmilk

I suspect we historically went with the ambiguous K/M/G/T prefixes to get a more precision in the zpool iostat|status 5-char columns. That is, you could print "500.4M" rather then "500MB" or "500MiB" (which wouldn't even fit in 5 chars...)

One possible solution could consist of:

ZFS_KB_IS_1000 envar - Default to 1000 instead of 1024 for byte values
ZFS_USE_IEC_PREFIX envar - Print KB/MB/GB/TB (base 10) or KiB/MiB/GiB/TiB (base 2) instead of K/M/G/T. This might be a pain to implement due to the 5-char columns, but I suspect it would still be doable.
Use ZFS_NICENUM_1000 for error counters in JSON output (and anywhere else its needed).

I'm fine with just number 1 being tacked in this PR, but you can try to fix the other ones if you want.

Oct 09 '24 19:10 tonyhutter

In broader context referencing PR #14598

Oct 09 '24 20:10 jumbi77

@jumbi77 thanks for the heads-up on that PR. I think we may need to be even more careful about ZFS_KB_IS_1000 since we've only talked about it for displaying numbers, not setting them. For example, this gets a little ambiguous:

export ZFS_KB_IS_1000=1
zfs set recordsize=8K tank

So we may want to rename the envar to something that wouldn't be ambiguous for those cases. Like include the word "display" or "output" in the envar name or something.

Oct 09 '24 21:10 tonyhutter

I have addressed the requested changes. Ready for review.

Nov 11 '24 22:11 jcassette

is there anyone actually asking for such functionality? with what rationale if I may? "may want to use" is at least for me personally not really enough :)

what good is it going to be if it's hidden behind an envvar nobody knows about? this is an added code, added maintenance burden one might say, but very little benefit? just because someone "may want"? ;)

Nov 21 '24 22:11 snajpa

@snajpa

is there anyone actually asking for such functionality? with what rationale if I may? "may want to use" is at least for me personally not really enough :)

#14598 and #11046 ?

this is an added code, added maintenance burden one might say, but very little benefit?

Okay, just let me know what you decide

Nov 25 '24 14:11 jcassette

@jcassette awesome thank you, I should have said that I'm genuinely interested in knowing that, to see whether we could maybe come up with something better than an env var...

also I'm in no position to reject this I would say :)) even if there was noone asking for it... I just have opinions which I'm sharing, up to you to either toss them away or maybe rethink, really

but personally I do agree with @ryao 's "It would be best to pick one convention and stick with it"

I'd also argue that probably noone presents these prettified ZFS numbers to (uninformed) users, I would dare to generalize that everyone uses some kind of raw value readouts, simply because that just makes the most sense, store measurements in base units...

least amount of work now and also going forward seems to be just reverting to how things were, especially if those two tickets are more about fixing problems by change that - really, as far as I can see - nobody asked for? :D I mean this is kinda comical, wouldn't you yourself consider this a rather unwanted drive-by contribution? ->

https://github.com/openzfs/zfs/pull/13363 - I would... reason: absolutely no considerations about downstream effects, probably too low operational familiarity with ZFS (would have known and thought about the 5 chars limitation otherwise)

(again, just my personal opinions, I'm not a lead anything here, can also blame self for not showing up for review during that time - or much at all, which I'm trying to improve ;))

Nov 26 '24 02:11 snajpa

After having thought about it some more, I don't want to be seen as the "old grumpy guy who won't change his habits so they die with him" by history, if you know what I mean :))

But I think it wouldn't be unreasonable to expect the change to be 100% consistent across the whole codebase.

https://github.com/openzfs/zfs/pull/14598#issuecomment-2500767831

Nov 26 '24 13:11 snajpa

If it helps, here's a branch I was working on last year trying to do roughly the same thing: https://github.com/robn/zfs/commits/byte-prefix/. Please feel free to steal from it, at least the first two commits, which fix a bunch of places where nicenum is called when it should be nicebytes.

I did a selector for three variations:

trad: powers of 1024 with SI prefixes
iec: powers of 1024 with IEC prefixes
si: powers of 1000 with SI prefixes

I too used an envvar, ZFS_BYTE_PREFIX, to set it. I was also going to add a --units or similar arg to commands to make it selectable there. My intention was that we'd use the tranditional format as default for now, but an operator could override it per-run or globally via the envvar. At some future point (major release maybe), we switch the default to iec, and anyone who wants the old behaviour can set the global var. (si is there for people that don't think about it much beyond "1000G" on the back of the box at the the hard drive shop).

The reason I wanted to keep the traditional format by default, at least for a while, is to not break scripts scraping output. This was before JSON output was available so less of a concern now, but I'd still wait for next release if I was doing it today, to give people time to adapt.

The main reason I didn't finish this was because I couldn't see how this made any sense without also changing it for inputs (eg properties), because any UI needs to be consistent; having input in one format and output in another is always going to be confusing. Yet, the 1000s versions make no sense: if we say that in the future, K is always 1000 and Ki is 1024, then you cannot ever write recordsize=128K, because it has to be power-of-2. So it either always has to be recordsize=128Ki, or, we allow the K to mean 1024 in those places. I could never find a way that didn't add confusion, which this is entire thing is supposed to remove!

So I dropped it. I was around before KiB was cool, so for me there's no ambiguity anyway - if its bytes, K means 1024, if not, 1000. The raw numbers are there if you need accuracy, otherwise its a ballpark anyway.

Nov 26 '24 22:11 robn

My opinion is that it's probably long-term the easiest option to go all-in on full compliance with standards, meaning the unambiguous valid short version of recordsize=131072 should be recordsize=128Ki, I think. It could also, especially if it makes implementation/integration in wider ecosystem easier, take in values like recordsize=131.072k, that is valid too, isn't it? Don't know how many decimals it handles currently and how much do we want from it, but there are also valid variants with Mi/M and so on, aren't there? :)

I think a new major release is the best opportunity where a whole-sale change like this could be advertised. I'm not afraid of breaking old scripts, it's just that people need a backup option until they fix their stuff - IMHO easily provided by an older release branch, maybe this could drive demand/expectations of longer support of such.

How about a module option (and envvar as a fallback for userspace only builds) as to which is the preferred form of displayed/rendered nicenums towards userspace? Whether to divide by 1024 (Ki/Gi..) or 1000 (k/G..).

OpenZFS 2.4 or 3.0 as a target for the big bang?

Why I think all-in approach is the easiest option long-term:

well, it can be confusing for some - and it'll be so for an ever increasing fraction of the userbase, simply b/c this is now being taught in schools :)
then it'll be reflected in the codebase eventually one way or another - and it already is in the documentation (which I still think is probably best to be reverted in current branches)
so, if it's unavoidable, how about we do it right - consistently so across the whole codebase?

I'm willing to help out (reachable on irc)

Nov 27 '24 05:11 snajpa

I guess mostly I don't see what the actual gain is. IME "standards compliance" is rarely by itself a good reason to do anything. If we were starting from scratch, then yes, no brainer, but there's scripts out there, there's published books, years of documentation, blog posts, videos, etc, that we'd be invalidating in one fell swoop.

For me, that's too much without a clear benefit. But I also know that I err towards the status quo far too often, waiting for supporting data that never comes. Which is why I surround myself with people who will say "nah, you're worrying too much" :)

Assuming you're saying "nah, you're worrying too much", and thinking on it more, the all-in is probably the way to do it. -p exists, and can be combined with numfmt to get a good-enough approximation of the old behaviour if you want that:

$ zfs list -p | numfmt --header --field 2-4 --to=iec --round=down --format=%.1f
NAME                        USED        AVAIL         REFER  MOUNTPOINT
crayon                    404.4G        26.1G        192.0K  none
crayon/dump                46.8M        26.1G         45.5M  /dump
crayon/home               395.8G        26.1G        200.0K  /home
crayon/home/robn          394.2G        26.1G        394.2G  /home/robn
crayon/home/root            1.6G        26.1G          1.6G  /root
crayon/root                 8.4G        26.1G        192.0K  none
crayon/root/debian          8.4G        26.1G          8.4G  /
crayon/var                 23.5M        26.1G        192.0K  /var
crayon/var/cache           15.0M        26.1G         15.0M  /var/cache
crayon/var/log              7.5M        26.1G          7.5M  /var/log
crayon/var/tmp            780.0K        26.1G        780.0K  /var/tmp

(And -j exists if you really want that).

On the input, I can get used to recordsize=128Ki, and if your suffix makes no sense, then I'd error, and if it's one of the classics, I'd say "K? You probably meant Ki". Gently force relearning.

(I wouldn't go near recordsize=131.072K, for at least nine reasons, one for each circle of hell, where this clearly came from :sweat_smile:).

Nov 27 '24 06:11 robn

I wouldn't go near recordsize=131.072K, for at least nine reasons, one for each circle of hell, where this clearly came from 😅

Me neither but I could see an implementation that could allow for this, just a few well though-out shared functions.

nah, you're worrying too much

something like that, but also that I could think of an implementation of this which would be nice to use and maintain, both, that's why I'd go for a module option + envvar, b/c I think it otherwise won't be that much code, maybe not even that many changed lines in the end - to get both decimal and binary prefixes supported

and in tune of what I said earlier, I don't have the motivation to drive it - but I'd help if someone else did :)

Nov 27 '24 06:11 snajpa

I could think of an implementation of this which would be nice to use and maintain, both, that's why I'd go for a module option + envvar

Aside of script compatibility I would not like to ask users each time about units they used in whatever they sent me and switch my brain accordingly. Also, does it mean switching memory units too to match disk units? Measuring memory in 1000 byte units would be even weirder than disks, while not doing it would create another source of confusions. To be short: I am against this. I have to regularly explain users why ZFS reports less disk space than disk's marketing, but at least it is consistent and stable for many years.

Nov 27 '24 15:11 amotin

Aside of script compatibility I would not like to ask users each time about units they used in whatever they sent me and switch my brain accordingly. Also, does it mean switching memory units too to match disk units? Measuring memory in 1000 byte units would be even weirder than disks, while not doing it would create another source of confusions. To be short: I am against this. I have to regularly explain users why ZFS reports less disk space than disk's marketing, but at least it is consistent and stable for many years.

the idea behind a global switch is to let the user choose whether they'd like decimal or binary prefixes more - but then when you get a paste, you'd see "Ki" or "k" accordingly, the main point of a breaking change (away from the 5 characters limit) would be to make ZFS use the prefixes correctly so it's obvious from a random paste which is which without any "tribal knowledge"

Nov 27 '24 19:11 snajpa

Also, does it mean switching memory units too to match disk units?

It probably would be useful if the nicenum formatting function got a hint whether those are memory or disk related numbers, that would eliminate the need to stick always to 1 variant for everything

Nov 27 '24 19:11 snajpa

In this example, it seems weird to express a number of errors in powers of 1024. (I think nobody expects "100K errors" to be 100Ki = 102400 errors).

I wasn't aware that ZFS actually used powers of 1024 to print number of errors, until I read the source code at that time (https://github.com/openzfs/zfs/pull/14596/commits/1a11949bc92d889c4a359017d025690e34269e17?diff=split).

I don't think most people aware this behavior because it is indeed weird. Therefore I still think it should at least be documented.

Jan 06 '25 20:01 Low-power

It might make sense to use lower-case k, m, etc. for non-byte powers of 1000 for error numbers and other values which have no particular reason to be base 1024 and keep the byte values upper-case power of 1024. In my experience with parted, which made a similar switch, the ambiguity of having your input "corrected" to valid multiples of page size, sector size, or whatever other power of 2 constraint existed, was more confusing than just using fdisk.

In particular, I don't think it should be possible to do something like recordsize=131.072K. I think if the units currently being displayed are SI then the only valid inputs should be 131072 131072B 128KiB Anything else should print an error regarding ambiguous units. The whole benefit of standard units is a lack of ambiguity, so switching to them but still allow ambiguous non-standard variants appears counterproductive.

Feb 07 '25 04:02 useranon350

It was brought up on yesterday leadership call, and discussed that it would be nice instead of a global switch (which would be a permanent "holy war") to somehow identify without hardcoding a list, which parameters are byte-based and should use 1024 powers, and which are not and should use 1000. Though it might be non-trivial and possibly error-prone how to parse some "1K" value, submitted by user.

Jun 18 '25 17:06 amotin

and which are not and should use 1000.

This means that suddenly, values are previously displayed using powers of 1024 will use 1000 instead. This would be an unexpected change of behavior, and should therefore be avoided.

Jun 19 '25 19:06 Low-power