shake icon indicating copy to clipboard operation
shake copied to clipboard

Integrate fiemap support and fixes v3

Open kakra opened this issue 8 years ago • 3 comments

This supersedes #11. It fixes an issue introduced during splitting out patches and rebasing.


This change is Reviewable

kakra avatar Jun 28 '17 21:06 kakra

executive.c, line 53 at r1 (raw file):

  /* Optimisation (on Linux it double the readahead window) */
  posix_fadvise (in_fd, (off_t) 0, (off_t) 0, POSIX_FADV_SEQUENTIAL);
  posix_fadvise (in_fd, (off_t) 0, (off_t) 0, POSIX_FADV_NOREUSE);

man7 says POSIX_FADV_NOREUSE is a no-op

That is possible. It's probably better to actually issue a WONT_NEED afterwards to free the page cache. That said, the man page doesn't always match what the kernel actually does. I found this when working on optimizing Steam fossilize.

The second posix_fadvise() call should probably simply be removed. The first line already doubles the readahead window which should be enough.

executive.c, line 233 at r1 (raw file):

shake_reg_backup_phase (struct accused *a, struct law *l)
{
  posix_fadvise (a->fd, (off_t) 0, (off_t) 0, POSIX_FADV_WILLNEED);

I don't think prefetching just before the actual blocking read would help? As I understand, this is more likely to help if some time separate both calls.

It will hint the cache to prefetch the whole file in advance. Depending on how fast the process can read the file, it may reduce head movement and prefetch metadata. In this case, it's probably not that useful, especially since when there is concurrency with other IO under memory pressure, shake would start dominating the cache - which we probably don't want.

linux.c, line 225 at r1 (raw file):

      sizelog[logs_pos] = -1;
      poslog[logs_pos] = -1;
    }

As discussed in the initial review, I suggest splitting this to a separate function as it's grown quite large with the change.

The whole process is a bit racy: While on ext4, locking the file and rewriting it is probably the way to go but there's still a window when shake can accidentally replace a modified file with the previous version, or even error out and leave a temporary file behind.

For other filesystems, e.g. btrfs (and maybe even xfs) there's an IOCTL to replace the inode with a new set of extents atomically. For xfs, I'd look at how xfs_fsr does it. For btrfs, it's probably worth writing a temporary file with the new contents, then use "extent same" to atomically replace the old extents with the new defragmented ones.

Since shake is not safe to be used unattended, I've stopped using it.

Instead, I'm running on hybrid btrfs now with dedicated SSDs for metadata, and bcache dedicated for data. This works great and provides way better performance than running shake trying to relocate data blocks. The only downside is that shake was able to reduce the amount of extents.

I'd look into shake again if it could be redesigned into a more safe way of operating but the problems are much inherent to its internal design. So if you want to apply the patches partially, feel free to do so. It's still an interesting software if its race conditions can be resolved because it can restore locality of files that belong together, e.g. I'd like to use the readahead data from the boot process to restore locality of all involved files (not on a directory level but order of read during boot).

If you need help extracting the fiemap code into a separate function, I can help with that. Also, it should keep the part to skip extent marked as shared so it won't unshare snapshot extents.

Before starting to merge patches, the code should be run through clang format or indent to fix its quite creative indents. It will make patch rebases much more easy. I'd prefer clang format as it is much more sophisticated and be be integrated into the commit process.

kakra avatar Mar 12 '22 19:03 kakra

If you need help extracting the fiemap code into a separate function, I can help with that. Also, it should keep the part to skip extent marked as shared so it won't unshare snapshot extents.

Thanks for offering. As you noted, the code dates a bit and could do better on modern filesystems. But as you also noted, the use of spinning disks is decreasing to the point that nowadays it's for some specific use cases. These days, people caring about latency use SSDs to serve that. And it seems both of us are doing just that...

So that's the reason why I'm reluctant to making changes myself, and maybe to changes in general:
I have the feeling shake is going to be like a CRT-only software (eg: tempest for eliza). Neat, at its epoch, now a memento. Maybe the correct course here is to update the documentation, mention this context and officially transition it as a "maintenance only" software.

Before starting to merge patches, the code should be run through clang format or indent to fix its quite creative indents. It will make patch rebases much more easy. I'd prefer clang format as it is much more sophisticated and be be integrated into the commit process.

Actually the current indent are indent's default mode of operation :-P That's how they got here. I gave clang-format a quick try. It does have a -style=GNU option, but that option doesn't properly follow GNU style (for example SpaceAfterCStyleCast is false when it ought to be true). I don't feel strongly about the current GNU style, but quickly glancing at some distros that carry shake, they patch it and I'd rather not break their build for no specific reason.

unbrice avatar Mar 13 '22 18:03 unbrice

they patch it

The question is: What do they patch? And would it be rather worth it to integrate those patches into shake upstream?

I think shake still has its use-cases and is based on some great ideas. That's probably why distros still ship it despite the mainstream era of spinning disks has passed, and storage that still uses spinning disks will either cache (using SSDs) or parallelize (by increasing spindles) - and both of those are not the general target audience of what shake tries to do.

Each FS has their own implementation of some defrag but none of them is locality aware as shake is. If shake wants to head anywhere, I think this is its strong selling point. And in that case, a follow-up version should look into how xfs or btrfs implement their atomic defraggers and apply the same (probably fs-specific) logic to shake.

I had an idea of adding some sort of boot-defrag to shake by feeding systemd-readahead data into it. But I quickly dropped that idea when I realized that shake could not run unattended because defrag-problems could go unnoticed (destroying the original file and leaving a temp file in place, it can happen if package updates and shake collide on the same files). This is another area that needs fixing.

kakra avatar Mar 13 '22 22:03 kakra