conserve icon indicating copy to clipboard operation
conserve copied to clipboard

Performance of blockdir prefix directories

Open sourcefrog opened this issue 3 years ago • 1 comments
trafficstars

One more thought from #177, cc @road2react and @WolverinDEV:

Conserve's current format puts blocks into subdirectories with a 3-hex-digit name, from the first 12 bytes of the hash. So there are up to 1<<12 or 4096 of them. This introduces a blocking mkdir ahead of writing each block file.

The point of this is to reduce the size of any single directory, although that is probably less of a concern on most local filesystems than in years past. It may actually help with rclone/Box, if the client regularly reads whole directories. It may still be a good idea for VFAT USB drives.

It's probably a loss on scalable local filesystems? In particular walking the list of blocks needs to read up to 4096 directories.

There are several options, and in order of priority:

  1. Remember which subdirectories are known to exist (because we already wrote or saw a block in them) and then there's no need to create them.
  2. In addition, at the start of a backup, read the block directory to see which prefixes are present and remember them. This has the added benefit of quickly answering whether a given hash can possibly be present.
  3. Make it tunable so that we can at least experiment with different settings, where 0 means no subdirectories. (It should be stored in some archive metadata. It may not be worth allowing this to be changed once the archive exists.)

I mention the first two first because they are direct efficiency wins that don't require a format change or guessing what's likely to be optimal in any situation, or making the user guess.

sourcefrog avatar Aug 13 '22 22:08 sourcefrog

#179 seems pretty interesting but I'm not having the time to join the conversation.
But after #173 I wanted to focus on performance (I'm into bug hunting) and encryption.
I'll probably respond under the week (I'm working weekends).

WolverinDEV avatar Aug 14 '22 00:08 WolverinDEV