compsize icon indicating copy to clipboard operation
compsize copied to clipboard

compsize should also show dedupe% and total% not only comp%

Open kakra opened this issue 5 years ago • 6 comments

It would be nice if compsize would show two additional columns to display the dedupe rate and total rate:

comp% = usage vs. uncompressed
dedupe% = uncompressed vs. referenced
total% = usage vs. referenced

See-also: https://github.com/Zygo/bees/issues/102

kakra avatar Sep 28 '20 08:09 kakra

uncompressed vs referenced won't do that. There are other effects beside dedupe that affect the difference: slack at the end of incompletely filled pages, partial pinned extents, etc.

The latter in particular can make a huge difference: a 1GB-4095B extent that has its first 1GB-4096B overwritten will have uncompressed=1GB referenced=1B. Of course, this is a pathological case, but eg. VM images are likely to have 3× or such overhead, while being huge.

kilobyte avatar Sep 28 '20 11:09 kilobyte

If you ask me, we could simply drop the dedupe% idea. I'd be more interested to additionally see usage vs. referenced.

Or invent a better name for it... ;-)

Or maybe put an asterisk and say * may include slack/partial pinning overhead

kakra avatar Sep 28 '20 12:09 kakra

Yeah, a neutral name that includes both reduction and increases would work.

I just have problems with coming with one :/

kilobyte avatar Sep 28 '20 18:09 kilobyte

The worst case on modern btrfs is only 32768x expansion (128M max extent, 4K referenced). The 1GB-4K case is only possible with a filesystem written before the extent size limit was imposed.

Does compsize have the necessary structure to estimate unreferenced blocks (contained in referenced extents, but not referenced by the files)? That's a very interesting number too, as a metric for triggering defrag runs.

Zygo avatar Sep 28 '20 18:09 Zygo

"usage" and "uncompressed" are the sum of sizes of all extents pinned by the file — that 128MB/1GB extent counts as its full length if even a single byte is referenced. This might be what you want.

For a compressed extent, speaking of unreferenced blocks inside doesn't even make sense — the whole extent is needed. I'm not sure if introducing a metric that's calculable only for a subset of files makes sense.

kilobyte avatar Sep 28 '20 19:09 kilobyte

Unreferenced can be calculated for compressed extents, but only on the uncompressed size where a block boundary makes any kind of sense. It could be approximated, but probably not worth it. I was thinking off the top of my head.

Zygo avatar Sep 29 '20 00:09 Zygo