tpie icon indicating copy to clipboard operation
tpie copied to clipboard

Incrementally deallocate temp file during reading using fallocate

Open Mortal opened this issue 7 years ago • 0 comments

In Linux one can use fallocate to "punch a hole" in a file, that is, deallocate parts of a large file. Could TPIE use this mechanism to delete temp files as they are read (given that the user does not need to read them again, e.g. in sorting) and thus save temp space?

The fallocate(1) user command says this:

-p, --punch-hole

Deallocates space (i.e., creates a hole) in the byte range starting at offset and continuing for length bytes. Within the specified range, partial filesystem blocks are zeroed, and whole filesystem blocks are removed from the file. After a successful call, subsequent reads from this range will return zeroes. This option may not be specified at the same time as the --zero-range option. Also, when using this option, --keep-size is implied.

Supported for XFS (since Linux 2.6.38), ext4 (since Linux 3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).

It sounds like the Thrill framework does this: cs/1608.05634v1 page 7

In Thrill we took pipelining of data processing one step further by enabling consumption of source DIA storage while pushing data to the next operation. DIA operations transform huge data sets, but a naive implementation would read all items from one DIA, push them all into the pipeline for processing, and then deallocate the data storage. Assuming the next operation also stores all items, this requires twice the amount of storage. However, with consume enabled, the preceding DIA operation’s storage is deallocated while processing the items, hence the storage for all items is needed only once, plus a small overlapping buffer.

Mortal avatar Aug 04 '17 08:08 Mortal