casync icon indicating copy to clipboard operation
casync copied to clipboard

Skip unused blocks when making images of block devices

Open FSMaxB opened this issue 8 years ago • 9 comments

I'm not sure if this is already implemented.

Essentially it would be really useful if casync would make use of information that the filesystem has about which blocks of a partition are actually used and which aren't. This allows to save huge amounts of space in the image if the filesystem doesn't contain a lot of data.

Similar to what partimage does.

FSMaxB avatar Jul 19 '17 12:07 FSMaxB

Hello, Casync uses the information of the filesystem superblock in order to find which blocks of a partition are actually used and reduce the size of the image accordingly.

You can find more details about the implementation in the encoder code here

briquet avatar Jul 19 '17 14:07 briquet

That's not what I meant. If a file system is not completely filled then a lot of blocks are not used but still contain previously used data, usually not zero.

From my understanding what partimage does is that it creates a map that says which blocks are used and which aren't by recursively following the filesystems data structure or using whatever mechanism the filesystem provides to find out which blocks are free to be reallocated for new data.

This map can then be used to ignore blocks and omit them from the image (in the casync case it would be chunks that only contain zeroes, thereby producing a lot of identical chunks that also have a high compression ratio).

FSMaxB avatar Jul 19 '17 15:07 FSMaxB

Hmm, I figure we could add some magic there, and consider unallocated bytes to be zero. They'd still appear in the serialization, but as all zeroes, and the natural lock deduplication would only store a single chunk for them. This would make things more reproducible, and we would take benefit of redundancy in the file system image.

poettering avatar Jul 24 '17 11:07 poettering

Yes, that is exactly what I meant (and what partimage does). Is there a kernel interface that can be used? Or do you actually need to parse the filesystem?

Also: This should be configurable. Some people want to make backups of their deleted files as well.

FSMaxB avatar Jul 24 '17 19:07 FSMaxB

There's no API for that. The file system parser would have to be written by hand.

poettering avatar Jul 25 '17 07:07 poettering

Not 100% true. There's FS_IOC_GETFSMAP, which is painful to use, but still better than writing FS parsers by hand.

magcius avatar Jul 29 '17 04:07 magcius

@magcius hmm, well, but that would mean we'd have to mount all file systems first, in order to then read them back on the block layer... not sure i like that... What I was intending to say that there's no API for just querying the kernel or anything else for an unmounted file system where it's data is

poettering avatar Jul 31 '17 14:07 poettering

Is it a problem to just mount it readonly?

FSMaxB avatar Jul 31 '17 14:07 FSMaxB

This will be complicated by the fact that typical block devices may also contain partitions with filesystems.

ghost avatar Aug 01 '17 01:08 ghost