Files Cache use in Borg Mount
Have you checked borgbackup docs, FAQ, and open GitHub issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
Question
System information. For client/server mode post info for both machines.
Both, same machine
Your borg version (borg -V).
1.2.0
Operating system (distribution) and version.
RHEL 9.1 (Rocky Linux)
Hardware / network configuration, and filesystems used.
ext4
How much data is handled by borg?
~360 GB Number of files: 1121938
archived by borg create --files-cache ctime,size repo is created in append only mode and sits on a networked drive which has higher latency and relatively slow in access speed.
Full borg commandline that lead to the problem (leave away excludes and passwords)
borg --bypass-lock mount repo::arc /mnt/lower (mount overlayfs with /mnt/lower + upper = overlay) cd /mnt/overlay (add just a few small files) borg create --files-cache ctime,size repo::newarc .
Describe the problem you're observing.
With a separate copy of the borg repo/cache files and data directory sym-linked to the original copy, I have setup an overlayfs on top of the borg mount.
When subsequent Borg Create is executed on the overlayfs backed by the borg mount as lower layer, it seems that instead of reading the metadata from the files cache, Borg would need to read all the chunks from the repo data directory to check for metadata (ctime,size) even for the unchanged files residing in the borg mount, making it very slow in my case because of the slow networked drive.
According to the borg docs, the files cache seems to be only used during Borg Create but not Borg Mount.
Is there any way to force Borg Mount to use the files cache for metadata intead of the data chunks? So in this scenario, the metadata of the unchanged files (resides in the borg mount) could be read from the files cache to speed things up?
Thank you for your help and thank you for this great piece of software.
Cheers, zK
you probably need a stable mounting point
Furthermore, pathnames recorded in files cache are always absolute, even if you specify source directories with relative pathname. If relative pathnames are stable, but absolute are not (for example if you mount a filesystem without stable mount points for each backup or if you are running the backup from a filesystem snapshot whose name is not stable), borg will assume that files are different and will report them as ‘added’, even though no new chunks will be actually recorded for them. To avoid this, you could bind mount your source directory in a directory with the stable path.
you probably need a stable mounting point
Furthermore, pathnames recorded in files cache are always absolute, even if you specify source directories with relative pathname. If relative pathnames are stable, but absolute are not (for example if you mount a filesystem without stable mount points for each backup or if you are running the backup from a filesystem snapshot whose name is not stable), borg will assume that files are different and will report them as ‘added’, even though no new chunks will be actually recorded for them. To avoid this, you could bind mount your source directory in a directory with the stable path.
Thanks for reply and heads up.
I should add that in the above setup, the absolute path stayed the same by using bind mount as suggested by the docs.
Cheers, zK
borg mount does not use the files cache.
when mounting an archive, borg always reads all the metadata into memory to build the filesystem (but not the files content data). so, that metadata should not cause additional repo accesses.
I'm also concerned that it's probably not safe to run borg create on a repo that is currently in use by borg mount.
That doesn't usually work because the repo will be locked by borg mount.
Dear @ThomasWaldmann ,
Thank you for your reply.
borg mount does not use the files cache.
Thanks for confirming this.
when mounting an archive, borg always reads all the metadata into memory to build the filesystem (but not the files content data). so, that metadata should not cause additional repo accesses.
Please shred some more lights on me... when borg mount reads all the metadata into memory:
- where does Borg read the metadata from? Is it the repo data chunks (~/data/*)?
- if so, once the mount command is complete and the mount point becomes accessible, does it mean the metadata read is complete?
- in my example, all 1121938 files' metadata would have finished loaded by then?
- if so, how much data reads would be required from ~/data for 1121938 files?
- Borg would have to scan and read a majority of files from ~/data?
so, that metadata should not cause additional repo accesses.
That's what I thought and what I am trying to clarify.
Thank you again.
Cheers, zK
- It reads the metadata from the repo (and yes, all backup data and metadata is stored below
repo_dir/data/). - yes IF you mount a single archive (if you mount the whole repo, it does on-demand loading for each archive)
- yes (if that is the file count in that one archive)
- that depends on the metadata size
- no, usually rather a few files (but can't be said precisely, due to repeated fragmentation/compaction)
Dear @ThomasWaldmann ,
That doesn't usually work because the repo will be locked by borg mount.
It should be safe like described in this example? A separate and fresh copy (at mount time) of base/repo/cache with symlinked ~/data files to a append-only repo, mounted read-only with --bypass-lock.... with concurrent Borg create of new archive. Only 1 read-only mount and 1 create would occur at the same time, no more.
Thank you again.
Cheers, zK
I think the important question here is: What are you trying to achieve?
overlayfs does its own caching and depending on what you are trying to do, and how you do it, you likely get served stale filecontent and I dont think that you will get stable inode numbers with overlayfs.
Guess this can be closed.