seastar
seastar copied to clipboard
xfs file allocation hint can leave unused space allocated on the disk
According to the XFS developers, space allocated with the extent size hint is not reclaimed after the file is truncated + close. This is triggered via file_open_options::extent_allocation_size_hint
.
We should fix this by calling fallocate(FALLOC_FL_PUNCH_HOLE)
to reclaim that space after the file is truncated.
It seems that XFS is smart nowadays to reclaim the wasted disk space on truncate().
This is a 244M generated by Scylla:
$ xfs_bmap -vvp ~/tmp/1/keyspace4/standard1-e7445bc0b29911eca2e3e884db8b5af7/me-15-big-Data.db
/home/raphaelsc/tmp/1/keyspace4/standard1-e7445bc0b29911eca2e3e884db8b5af7/me-15-big-Data.db:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..65535]: 141815824..141881359 0 (141815824..141881359) 65536 000000
1: [65536..262143]: 146580720..146777327 0 (146580720..146777327) 196608 000000
2: [262144..498511]: 147002032..147238399 0 (147002032..147238399) 236368 000000
With extent size of 32M, file should be wasting about ~12M in the last extent, but turns out that it's actually only taking (236368+196608+65536)×512÷(1024^2) = ~244M.
@avikivity FYI
That's confirmed by Brian Foster:
> Is ftruncate() sufficient to release extents past-the-end, or do we need an
> extra FALLOC_FL_PUNCH_HOLE?
Yes, a truncate trims post-eof blocks.
Any idea when this was fixed?
Any idea when this was fixed?
@avikivity
Checked out to Linux 4.0, and we have xfs_setttr_size() doing:
if (newsize <= oldsize) {
error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, newsize);
xfs_itruncate_extents() is documented as follow:
* Free up the underlying blocks past new_size. The new size must be smaller
* than the current size. This routine can be used both for the attribute and
* data fork, and does not modify the inode size, which is left to the caller.
Right below in xfs_setttr_size(), we have the following:
/* A truncate down always removes post-EOF blocks. */
xfs_inode_clear_eofblocks_tag(ip);
git blame on line above interestingly reveals this:
commit 27b52867925e3aaed090063c1c58a7537e6373f3
Author: Brian Foster <[email protected]>
Date: Tue Nov 6 09:50:38 2012 -0500
xfs: add EOFBLOCKS inode tagging/untagging
Add the XFS_ICI_EOFBLOCKS_TAG inode tag to identify inodes with
speculatively preallocated blocks beyond EOF. An inode is tagged
when speculative preallocation occurs and untagged either via
truncate down or when post-EOF blocks are freed via release or
reclaim.
That's Linux 3.8. We've never used anything older than 3.10 (RHEL 7), so probably there is an additional fix later.