compsize
compsize copied to clipboard
compsize should also show dedupe% and total% not only comp%
It would be nice if compsize would show two additional columns to display the dedupe rate and total rate:
comp% = usage vs. uncompressed
dedupe% = uncompressed vs. referenced
total% = usage vs. referenced
See-also: https://github.com/Zygo/bees/issues/102
uncompressed vs referenced won't do that. There are other effects beside dedupe that affect the difference: slack at the end of incompletely filled pages, partial pinned extents, etc.
The latter in particular can make a huge difference: a 1GB-4095B extent that has its first 1GB-4096B overwritten will have uncompressed=1GB referenced=1B. Of course, this is a pathological case, but eg. VM images are likely to have 3× or such overhead, while being huge.
If you ask me, we could simply drop the dedupe% idea. I'd be more interested to additionally see usage vs. referenced.
Or invent a better name for it... ;-)
Or maybe put an asterisk and say * may include slack/partial pinning overhead
Yeah, a neutral name that includes both reduction and increases would work.
I just have problems with coming with one :/
The worst case on modern btrfs is only 32768x expansion (128M max extent, 4K referenced). The 1GB-4K case is only possible with a filesystem written before the extent size limit was imposed.
Does compsize have the necessary structure to estimate unreferenced blocks (contained in referenced extents, but not referenced by the files)? That's a very interesting number too, as a metric for triggering defrag runs.
"usage" and "uncompressed" are the sum of sizes of all extents pinned by the file — that 128MB/1GB extent counts as its full length if even a single byte is referenced. This might be what you want.
For a compressed extent, speaking of unreferenced blocks inside doesn't even make sense — the whole extent is needed. I'm not sure if introducing a metric that's calculable only for a subset of files makes sense.
Unreferenced can be calculated for compressed extents, but only on the uncompressed size where a block boundary makes any kind of sense. It could be approximated, but probably not worth it. I was thinking off the top of my head.