seastar icon indicating copy to clipboard operation
seastar copied to clipboard

xfs file allocation hint can leave unused space allocated on the disk

Open avikivity opened this issue 7 years ago • 5 comments

According to the XFS developers, space allocated with the extent size hint is not reclaimed after the file is truncated + close. This is triggered via file_open_options::extent_allocation_size_hint.

We should fix this by calling fallocate(FALLOC_FL_PUNCH_HOLE) to reclaim that space after the file is truncated.

avikivity avatar Nov 29 '17 16:11 avikivity

It seems that XFS is smart nowadays to reclaim the wasted disk space on truncate().

This is a 244M generated by Scylla:

$ xfs_bmap -vvp ~/tmp/1/keyspace4/standard1-e7445bc0b29911eca2e3e884db8b5af7/me-15-big-Data.db 
/home/raphaelsc/tmp/1/keyspace4/standard1-e7445bc0b29911eca2e3e884db8b5af7/me-15-big-Data.db:
 EXT: FILE-OFFSET       BLOCK-RANGE          AG AG-OFFSET               TOTAL FLAGS
   0: [0..65535]:       141815824..141881359  0 (141815824..141881359)  65536 000000
   1: [65536..262143]:  146580720..146777327  0 (146580720..146777327) 196608 000000
   2: [262144..498511]: 147002032..147238399  0 (147002032..147238399) 236368 000000

With extent size of 32M, file should be wasting about ~12M in the last extent, but turns out that it's actually only taking (236368+196608+65536)×512÷(1024^2) = ~244M.

@avikivity FYI

raphaelsc avatar Apr 02 '22 15:04 raphaelsc

That's confirmed by Brian Foster:

> Is ftruncate() sufficient to release extents past-the-end, or do we need an
> extra FALLOC_FL_PUNCH_HOLE?

Yes, a truncate trims post-eof blocks.

raphaelsc avatar Apr 02 '22 16:04 raphaelsc

Any idea when this was fixed?

avikivity avatar Apr 03 '22 09:04 avikivity

Any idea when this was fixed?

@avikivity

Checked out to Linux 4.0, and we have xfs_setttr_size() doing:

	if (newsize <= oldsize) {
		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, newsize);

xfs_itruncate_extents() is documented as follow:

 * Free up the underlying blocks past new_size.  The new size must be smaller
 * than the current size.  This routine can be used both for the attribute and
 * data fork, and does not modify the inode size, which is left to the caller.

Right below in xfs_setttr_size(), we have the following:

		/* A truncate down always removes post-EOF blocks. */
		xfs_inode_clear_eofblocks_tag(ip);

git blame on line above interestingly reveals this:

commit 27b52867925e3aaed090063c1c58a7537e6373f3
Author: Brian Foster <[email protected]>
Date:   Tue Nov 6 09:50:38 2012 -0500

    xfs: add EOFBLOCKS inode tagging/untagging
    
    Add the XFS_ICI_EOFBLOCKS_TAG inode tag to identify inodes with
    speculatively preallocated blocks beyond EOF. An inode is tagged
    when speculative preallocation occurs and untagged either via
    truncate down or when post-EOF blocks are freed via release or
    reclaim.

raphaelsc avatar Apr 04 '22 15:04 raphaelsc

That's Linux 3.8. We've never used anything older than 3.10 (RHEL 7), so probably there is an additional fix later.

avikivity avatar Apr 05 '22 10:04 avikivity