bupstash icon indicating copy to clipboard operation
bupstash copied to clipboard

Introduce subdirectories inside of data/ to reduce the number of files in a single directory

Open mjjohnson opened this issue 2 years ago • 8 comments

Right now all of the data files in a repository go into a single directory (data/). Given an average chunk size of 1MB, a 10TB repository would mean 10 million files end up in a single directory.

Some tools can struggle to handle very large numbers of files in a directory, if they try to store information about all files at once in memory. (Similar to #314, although in that case it took a lot more than 10 million files to cause issues.) For instance, ls on millions of files is a bad idea. And at some point it's apparently possible to hit actual filesystem limits; I ran across this example of someone experiencing issues on ext4 with a directory that had around 5 million files: https://adammonsen.com/post/1555/

What about using a hierarchical approach, with a few levels of subdirectories inside of data/ to split things up based on the leading characters in each filename? For instance, a file named 0123456789abcdef would be at the path data/0/1/2/0123456789abcdef.

This would reduce the expected number of files in any single leaf directory by a factor of 16^3 = 4096, so for instance a 10TB repository would have around 2400 files per leaf directory.

Thoughts?

mjjohnson avatar Nov 03 '22 02:11 mjjohnson

Something like this is definitely coming - It's a good idea - multi TB repositories are becoming more and more common.

andrewchambers avatar Nov 03 '22 04:11 andrewchambers

+1

git uses hash prefix subdirs using the first 2 hex characters (8 bits) of the hash (a hash such as efdeadbeef would be stored in ef/deadbeef).

For older filesystems, this helps avoid limits on the maximum number of files per directory, and performance bottlenecks dealing with large directories (large dirs can be really painful in filesystems that don't use b-trees).

Even for newer filesystems that have efficient tree lookup and insertion, using hash prefix subdirs can still be a great help because it can reduce filesystem lock contention. If N threads all want to add a new file to a directory, they can all end up serializing on a filesystem write lock for that directory. Spreading the files out over 256 subdirs (00 through ff) is an easy way to give those N threads 256 possible directory locks instead of just 1, which greatly reduces the chances that two or more of them will contend for the same lock in the filesystem code.

smferris avatar Jan 09 '23 23:01 smferris

The main concern is how to deal with directory fsyncing - its actually sort of an interesting challenge. Adding 256 dirs means we need to have a minimum of 256 open file handles to follow the strictest fsync semantics, this can mean we go above the default ulimit.

A likely solution is going to be configurable with a lower default.

andrewchambers avatar Jan 10 '23 02:01 andrewchambers

The default soft limit is low so that people notice fd leaks, and because most processes don't need to keep a lot of files open, but a process that does need a lot of open files can increase the limit by calling setrlimit(RLIMIT_NOFILE, ...), up to the hard limit (which only root can change).

The hard limits are high enough to not seem like a problem to me, at least on the macOS and Linux systems I have at the moment: macOS 11.7.2: ulimit -Hn == unlimited Arch Linux: ulimit -Hn == 524288

smferris avatar Jan 10 '23 07:01 smferris

#130

ptman avatar Aug 03 '23 08:08 ptman

Just wanted to second this request as I have terabytes of data that I would like to backup using bupstash. I am hesitant to proceed though due to the extreme number of files that would be generated in one folder.

lenzj avatar Mar 03 '24 23:03 lenzj

Have been a bit busy, but this change will be coming. I have an implementation in progress,

andrewchambers avatar Mar 04 '24 10:03 andrewchambers

I just ran into an issue with about 5.2 million files in the data directory with ext4 and was scratching my head for a minute since there was still plenty of space and inodes left, I managed to get it working again with tune2fs -O large_dir [block-device].

AndrolGenhald avatar Apr 04 '24 14:04 AndrolGenhald