Skip unused blocks when making images of block devices
I'm not sure if this is already implemented.
Essentially it would be really useful if casync would make use of information that the filesystem has about which blocks of a partition are actually used and which aren't. This allows to save huge amounts of space in the image if the filesystem doesn't contain a lot of data.
Similar to what partimage does.
Hello, Casync uses the information of the filesystem superblock in order to find which blocks of a partition are actually used and reduce the size of the image accordingly.
You can find more details about the implementation in the encoder code here
That's not what I meant. If a file system is not completely filled then a lot of blocks are not used but still contain previously used data, usually not zero.
From my understanding what partimage does is that it creates a map that says which blocks are used and which aren't by recursively following the filesystems data structure or using whatever mechanism the filesystem provides to find out which blocks are free to be reallocated for new data.
This map can then be used to ignore blocks and omit them from the image (in the casync case it would be chunks that only contain zeroes, thereby producing a lot of identical chunks that also have a high compression ratio).
Hmm, I figure we could add some magic there, and consider unallocated bytes to be zero. They'd still appear in the serialization, but as all zeroes, and the natural lock deduplication would only store a single chunk for them. This would make things more reproducible, and we would take benefit of redundancy in the file system image.
Yes, that is exactly what I meant (and what partimage does). Is there a kernel interface that can be used? Or do you actually need to parse the filesystem?
Also: This should be configurable. Some people want to make backups of their deleted files as well.
There's no API for that. The file system parser would have to be written by hand.
Not 100% true. There's FS_IOC_GETFSMAP, which is painful to use, but still better than writing FS parsers by hand.
@magcius hmm, well, but that would mean we'd have to mount all file systems first, in order to then read them back on the block layer... not sure i like that... What I was intending to say that there's no API for just querying the kernel or anything else for an unmounted file system where it's data is
Is it a problem to just mount it readonly?
This will be complicated by the fact that typical block devices may also contain partitions with filesystems.