btrfs-progs icon indicating copy to clipboard operation
btrfs-progs copied to clipboard

Allow setting zstd compression level with btrfs defrag

Open ghost opened this issue 5 years ago • 26 comments

Currently btrfs defrag -c zstd uses a fixed compression level. It would be useful to be able to set it manually. The same way we can in mount options. zstd:1-15.

Thanks.

ghost avatar Jun 20 '19 22:06 ghost

This would be great for forcing greater compression on large static files without archiving them.

denharad avatar Jun 29 '19 18:06 denharad

This is my thinking. Stale files could be compressed higher without much loss of performance in case they need to be accessed. There are other use-cases too, I'm sure.

ghost avatar Jun 29 '19 20:06 ghost

This needs enhancement of the ioctl interface, there's space in the data structure to specify the level but it needs to be properly handled in kernel.

kdave avatar Jul 02 '19 14:07 kdave

Is this being worked on? Otherwise I'd like to attempt to implement this

qwerty123443 avatar Jan 02 '20 19:01 qwerty123443

As visible above, the code is now written. I have no clue if i'm putting the pull request in the right place so please let me know if I should rebase or merge into another repo

qwerty123443 avatar Jan 04 '20 20:01 qwerty123443

Any progress on this? AFAIK kernel (and hence, btrfs) development does not typically happen through GitHub. What would it take for the patches from the above pull requests to be merged into upstream?

inodentry avatar Feb 27 '20 14:02 inodentry

@qwerty123443 I believe it would be better if you could send the patches through the mailing list :)

marcosps avatar Feb 28 '20 12:02 marcosps

Ah yes, I've kind of forgotten about this. I'll take a look at the mailing list soon (tm). There is still some refinement to be done and at the moment I do not have the time to fix it up properly. For anyone reading this: feel free to use the code I've written and change it + submit it to the linux/btrfs mailing list

qwerty123443 avatar Mar 03 '20 13:03 qwerty123443

I was wondering why the compression level is capped at 15 for zstd. They actually go up to 22 (when ultra is on).

Would be nice to be able to compress files with more effort, if you don't bother about the speed.

RubenKelevra avatar Apr 11 '20 09:04 RubenKelevra

I've submitted a patch to the mailing list, and I hope it'll be merged soon. @RubenKelevra It's been a while since I've looked in to it, but if I recall correctly, the kernel does not support levels higher than 15 because of the way the workspaces are (were?) set up. Definately not sure about this one though.

The patch can be found here

qwerty123443 avatar Apr 12 '20 09:04 qwerty123443

I was wondering why the compression level is capped at 15 for zstd. They actually go up to 22 (when ultra is on).

Would be nice to be able to compress files with more effort, if you don't bother about the speed.

Just a guess, but maybe it has to do with the fact that btrfs compresses files in 128 KiB "ranges", where each 128 KiB extent is effectively treated as a separate file for compression purposes. This allows for fast random access to compressed files, since only the 128 KiB extent you want needs to be decompressed, but probably negates most or all of the benefits of higher zstd levels (e.g. larger window). You could test that by compressing some 128 KiB sample files with the command-line zstd and seeing if there's any difference between the ratio you get at 15 vs 22. The 128 KiB ranges' compressed size is also effectively rounded up to a multiple of 4KiB, since each is stored as a separate extent on disk. A small difference in compression ratio won't even matter unless it's enough to make that extent one 4KiB block smaller. Bottom line, filesystem-level compression inevitably involves some tradeoffs of this nature and if you don't need the benefits that it offers, like fast random access, then just use file-level compression with whatever tool you like. Zstd also supports negative compression levels now with --fast, for even faster compression along the lines of lz4, but we can't specify those with btrfs either. That might also be useful, but in my experience zstd:1 is fast enough even with a PCIe 4 SSD.

automorphism88 avatar Sep 04 '21 18:09 automorphism88

just tried btrfs filesystem defragment -r -czstd:4 and did not work :/ would love to have this.

jfcg avatar Mar 16 '22 10:03 jfcg

@qwerty123443 Your patch hasn't been merged into the kernel right? I couldn't find the mailing list thread via Google, any way we can track the status? Thanks for putting the work in anyway!

CrawX avatar Mar 28 '22 09:03 CrawX

@qwerty123443 Your patch hasn't been merged into the kernel right?

Correct, life happened and I got distracted by other things unfortunately. I've written a new patch that should be a lot closer to getting accepted into the mailing list, though I don't have the time to write a proper patch and submit it.

the attached code compiles, but isn't tested; It modifies the btrfs_ioctl_defrag_range_args struct so that you can pass a compress_level along with the compress_type and I think it should work out of the box (given you make slightly modified version of btrfs-progs that makes use of the compress_level)

If anyone reads this and does have the time and knowledge to finalize this, feel welcome to do it :) (most of the underlying work is already done, so it shouldn't be too much work)

I couldn't find the mailing list thread via Google, any way we can track the status? Thanks for putting the work in anyway!

That's a good question, I'm not sure to be hones.

Patch can be found here: download

From 55e07d6dd155269d4b8b8bdcb03716ca5df815cb Mon Sep 17 00:00:00 2001
Date: Tue, 29 Mar 2022 11:50:39 +0200
Subject: [PATCH] fs/btrfs: Allow for a per-extent compression level

---
 fs/btrfs/inode.c           | 34 ++++++++++++++++++++++++----------
 fs/btrfs/ioctl.c           |  6 +++++-
 include/uapi/linux/btrfs.h |  9 ++++++++-
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index aa0a60ee26cb..9f751076a00e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -597,6 +597,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 	int i;
 	int will_compress;
 	int compress_type = fs_info->compress_type;
+	int compress_level = 0;
 	int compressed_extents = 0;
 	int redirty = 0;

@@ -675,8 +676,10 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 			goto cont;
 		}

-		if (BTRFS_I(inode)->defrag_compress)
-			compress_type = BTRFS_I(inode)->defrag_compress;
+		if (BTRFS_I(inode)->defrag_compress){
+			compress_type = BTRFS_I(inode)->defrag_compress & 0xF;
+			compress_level = BTRFS_I(inode)->defrag_compress >> 4;
+		}
 		else if (BTRFS_I(inode)->prop_compress)
 			compress_type = BTRFS_I(inode)->prop_compress;

@@ -697,14 +700,25 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 			redirty = 1;
 		}

-		/* Compression level is applied here and only here */
-		ret = btrfs_compress_pages(
-			compress_type | (fs_info->compress_level << 4),
-					   inode->i_mapping, start,
-					   pages,
-					   &nr_pages,
-					   &total_in,
-					   &total_compressed);
+		/* Compression level is applied here and only here
+		 * Takes the compression level from the inode, if set
+		 */
+		if (compress_level)
+			ret = btrfs_compress_pages(
+				compress_type | compress_level << 4,
+				inode->i_mapping, start,
+				pages,
+				&nr_pages,
+				&total_in,
+				&total_compressed);
+		else
+			ret = btrfs_compress_pages(
+				compress_type | fs_info->compress_level << 4,
+				inode->i_mapping, start,
+				pages,
+				&nr_pages,
+				&total_in,
+				&total_compressed);

 		if (!ret) {
 			unsigned long offset = offset_in_page(total_compressed);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 238cee5b5254..7534e13c487d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1777,6 +1777,7 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 	bool do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
 	bool ra_allocated = false;
 	int compress_type = BTRFS_COMPRESS_ZLIB;
+	int compress_level = 0;
 	int ret = 0;
 	u32 extent_thresh = range->extent_thresh;
 	pgoff_t start_index;
@@ -1792,6 +1793,8 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 			return -EINVAL;
 		if (range->compress_type)
 			compress_type = range->compress_type;
+		if (range->compress_level)
+			compress_level = range->compress_level;
 	}

 	if (extent_thresh == 0)
@@ -1855,7 +1858,8 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 			break;
 		}
 		if (do_compress)
-			BTRFS_I(inode)->defrag_compress = compress_type;
+			BTRFS_I(inode)->defrag_compress = compress_type |
+							  compress_level << 4;
 		ret = defrag_one_cluster(BTRFS_I(inode), ra, cur,
 				cluster_end + 1 - cur, extent_thresh,
 				newer_than, do_compress, &sectors_defragged,
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index d956b2993970..b330d0896a5a 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -605,8 +605,15 @@ struct btrfs_ioctl_defrag_range_args {
 	 */
 	__u32 compress_type;

+	/*
+	 * which compression strength to use if turning on compression
+	 * for this defrag operation.  If unspecified, the inode's value
+	 * will be used
+	 */
+	__u32 compress_level;
+
 	/* spare for later */
-	__u32 unused[4];
+	__u32 unused[3];
 };


--
2.30.2


qwerty123443 avatar Mar 29 '22 10:03 qwerty123443

Would be nice to be able to compress files with more effort, if you don't bother about the speed.

As I expected, the compression levels are indeed limited to the following: LZO: 1 Zlib: 9 ZSTD: 15

qwerty123443 avatar Mar 29 '22 10:03 qwerty123443

I wonder when it will hit the repos? This feature would be a game changer for folks like me who run zstd on a NVMe drive.

Dalcinrafa avatar Sep 30 '22 01:09 Dalcinrafa

I wonder when it will hit the repos? This feature would be a game changer for folks like me who run zstd on a NVMe drive.

Why would it be a game changer? You can already remount with a different compression and run defrag on the folder. The feature here would just make it easier to do.

John-Gee avatar Sep 30 '22 02:09 John-Gee

wouldn't that be just inefficient and it would be redundant to make a mount profile or each file to have a custom/forced compression when btrfs-defrag could change the attr and compress at a certain level automatically. also you can't unmount a drive if its having data actively added and read, just for one stale file or one folder of files.

[bump]

joshuachris2001 avatar Oct 18 '22 02:10 joshuachris2001

The patch makes it work in kernel but lacks a few things, namely backward compatibility checks, the rest are basically coding style issues.

  • extending the structure must be done with adding a new flag and then the level is parsed
  • level must be verified against the max per compression type
  • grabbing another u32 is wasteful when we have u32 for compression already and the values go from 0 to 4 only, so we can use the upper byte for extensions for level or telling defrag to 'nocompress'
  • type + level needs to be wrapped in helpers, the format with shift by 4 is arbitrary

kdave avatar Oct 21 '22 14:10 kdave

Stale files could be compressed higher without much loss of performance in case they need to be accessed. Would be nice to be able to compress files with more effort, if you don't bother about the speed.

There are plenty of compression algorithms that don't noticeably affect decomp speed on higher levels, and zstd is definitely one of them (see https://openzfs.org/w/images/b/b3/03-OpenZFS_2017_-_ZStandard_in_ZFS.pdf slide 9). Falloff beyond 19 is somewhat expected, but it might not actually translate to small block-size use (because different parameters are used).

Just a guess, but maybe it has to do with the fact that btrfs compresses files in 128 KiB "ranges",

https://github.com/facebook/zstd/blob/dev/lib/compress/clevels.h shows that unique levels are defined for 128 KiB blocks up to 22 (despite whatever snake-oil factor that can be there). Plus the fast ones.

Zstd also supports negative compression levels now with --fast, for even faster compression along the lines of lz4, but we can't specify those with btrfs either.

Agree that the even faster stuff can be useful to match SSD speed: some compression is less writing than none. Yes, even at the cost of doing no Huffman on the negative levels at all: LZ4 does that and ended up fine.

@qwerty123443 patch u32

Uhhh, make it signed maybe?

@kdave squeeze into compress_type

We need about 6 bits if we want to go from negative to 22, or 5 if you are happy to squeeze and map it to {-10..-1} {1..22}. Going too far into negative territory is no good anyways, as one would expect. (Still, why bother with the bit-clutching? Just throw a whole byte in. Or reuse the << 4 format.)

The good news is that kernel zstd is new enough to have negative levels: ZSTD_minCLevel is there!

Artoria2e5 avatar Jun 14 '23 12:06 Artoria2e5

please put that

SteavenGamerYT avatar Oct 14 '23 19:10 SteavenGamerYT