MergerFS doesn't merge symlinked directories.
General description
MergerFS doesn't merge symlinked directories.
Expected behavior
On different drives, I have a directory named timeline. In that directory I have directories named like 'today', 'yesterday', 'day before yesterday', 'this week', 'last week', 'this month' and 'last month' pointing to directories with snapshots of these devices (with less convenient names). When I merge-mount these devices with mergerFS, I want merferFS to merge the content of these symlinked directories (at least as an option).
Actual behavior
When you merge two folders, with corresponding symlinks only the content of the first found folder is being shown.
In following example /NAS/ND[12]/Today are symlinks to other directories. root@system:/NAS/ND1/timeline# cd /NAS/ND1/timeline/Today root@system:/NAS/ND1/timeline/Today# dir test.txt root@system:/NAS/ND1/timeline/Today# cd /NAS/ND2/timeline/Today root@system:/NAS/ND2/timeline/Today# dir hi.txt hello.txt install.sh install.sh.~1~ Test root@system:/NAS/ND2/timeline/Today# mergerfs -o allow_other,use_ino,func.access=all,nonempty "/NAS/ND*/timeline" "/NAS/timeline" root@system:/NAS/ND2/timeline/Today# cd /NAS/timeline/Today root@system:/NAS/timeline/Today# dir test.txt <= Only content of /NAS/ND1/timeline
Precise steps to reproduce the behavior
- [ ] Make in two different folders a symlink with exactly the same name, linking to any directory in any filesystem.
- [ ] Merge-mount two different folders.
- [ ] When you browse the merged filesystem, you only will get the content of the first found folder.
System information
Please provide as much of the following information as possible:
- [ ] mergerfs version: 2.24.2
- [ ] mergerfs settings: mergerfs -o allow_other,use_ino,func.access=all,nonempty "$sNASPath/ND*/timeline" "$sMergePath" ($sNASPath/ND* are the mountpoints of different NAS-devices)
- [ ] Linux version: Linux 5.4.42-v8+ aarch64 GNU/Linux
- [ ] List of drives, filesystems, & sizes: BTRFS-filesystems.
Merging symlinked directories is very different from what exists now and somewhat involved. symlinks are files. As such mergerfs returns them as they are defined. A symlink does not need to point to anything valid. It is more like a key=value structure where to read it requires an explicit call to readlink.
Doing something like that is pretty involved I think. Especially if you're going to try to keep other symlinks valid. I don't fully understand your setup but couldn't you simply change the symlinks to point to the mergerfs pool or make them relative rather than absolute? I suppose I could add a symlink rewriting feature to replace known branches found in symlinks to that of the mergerfs mount point which would do the same.
I already assumed, it was.
Actually a simple filetree looks like this:
/NAS/ND1/ (drive 1)
├── data
│ └── test.txt
├── snapshots
│ ├── 200617_230145 daily
│ │ └── test.txt
│ ├── 200618_231745 daily
│ │ └── test.txt
│ └── 200619_155419 daily
│ └── test.txt
└── timeline
├── Eergisteren -> /NAS//ND1/snapshots/200617_230145 daily
├── Gisteren -> /NAS//ND1/snapshots/200618_231745 daily
└── Vandaag -> /NAS//ND1/snapshots/200619_155419 daily
/NAS/ND2/ (drive 2)
├── data
│ ├── daag.txt
│ ├── hallo.txt
│ ├── install.sh
│ ├── install.sh.~1~
│ └── Test
│ └── Daag
├── snapshots
│ ├── 200617_230145 daily
│ │ ├── daag.txt
│ │ ├── hallo.txt
│ │ ├── install.sh
│ │ └── install.sh.~1~
│ ├── 200618_231745 daily
│ │ ├── daag.txt
│ │ ├── hallo.txt
│ │ ├── install.sh
│ │ ├── install.sh.~1~
│ │ └── Test
│ ├── 200619_155419 daily
│ │ ├── daag.txt
│ │ ├── hallo.txt
│ │ ├── install.sh
│ │ ├── install.sh.~1~
│ │ └── Test
│ └── install.sh
└── timeline
├── Eergisteren -> /NAS//ND2/snapshots/200617_230145 daily
├── Gisteren -> /NAS//ND2/snapshots/200618_231745 daily
└── Vandaag -> /NAS//ND2/snapshots/200619_155419 daily
/NAS/timeline (a mergerfs-pool)
├── Eergisteren -> /NAS//ND1/snapshots/200617_230145 daily
├── Gisteren -> /NAS//ND1/snapshots/200618_231745 daily
└── Vandaag -> /NAS//ND1/snapshots/200619_155419 daily
Of course it would be possible to create a script like:
for sMountedMFS in "/NAS/timeline/"* ; do
mountpoint "$sMountedMFS" && umount "$sMountedMFS" && rmdir "$sMountedMFS"
done
declare -A asPaths
for sTimelinePoint in "/NAS/ND"*"/timeline/"* ;
[[ -L "$sTimeLinePoint" ]] || continue
asPaths[$( basename "$sTimeLinePoint" )]+=":$( readlink -f "$sLink" )"
done
for sTimeLinePoint in "${!asPaths[@]}" ; do
sMountedMFS="/NAS/timeline/$sTimeLinePoint"
mkdir "$sMountedMFS"
mergerfs -o allow_other,use_ino,func.access=all,nonempty "${asPaths[$sTimeLinePoint]:1}" "$sMountedMFS"
done
What do you mean by:
Especially if you're going to try to keep other symlinks valid?
I don't fully understand your setup but couldn't you simply change the symlinks to point to the mergerfs pool or make them relative rather than absolute?
No, since I first made them relative (../snapshots/* instead of /NAS/ND?/snapshots/*). But Samba couldn't handle relative symlinks (didn't show them at all in the directory, that only could be helped by making the references absolute).
I suppose I could add a symlink rewriting feature to replace known branches found in symlinks to that of the mergerfs mount point which would do the same.
A rewriting would be nice, but the feature should treat symlinks to directories as if they were the directory themselves (and if it's not valid, that's the user's problem: I keep them valid by rechecking each time when a new snapshot is being created and at each setup of the service).
I already assumed, it was.
Assumed it was what? Rewriting symlinks? No. It's generally not needed unless someone didn't have control over the symlink. It could also break things. Symlinks are arbitrary values. If they are needed to point to something within the pool it's trivial to do so.
What do you mean by: Especially if you're going to try to keep other symlinks valid?
How exactly does it know what symlinks to fake as directories? Symlinks are arbitrary things. They don't have to have a reference to anything valid. Or even paths. What would be the metrics for knowing what symlinks to convert and what not to convert?
But Samba couldn't handle relative symlinks (didn't show them at all in the directory, that only could be helped by making the references absolute).
I've not messed with symlinks in Samba really. Have you tried changing the symlinks to absolute values to the pool? Or do you use them out of the pool and need them to be original?
A rewriting would be nice, but the feature should treat symlinks to directories as if they were the directory themselves (and if it's not valid, that's the user's problem: I keep them valid by rechecking each time when a new snapshot is being created and at each setup of the service).
Those are different things. Rewriting means they are shown as symlinks. The value is just different. And as a blanket rule could have problems if you actually want to keep some symlinks. What if you legit want to point to the original branch?
Treating symlinks to things as the thing themselves means active inspection and rewriting of entry attr info to change it from a symlink to a directory or file and managing the complications with determining if it should be done as mentioned above.
Perhaps I'm missing something but what would be the benefit of treating symlinks as the target over rewriting the symlink?
Assumed it was what?
More complicated than I would have said at facing the matter for the first time.
How exactly does it know what symlinks to fake as directories?
Pseudocode:
asMergerPaths = '/path/a:/path/b'
bFollowSymLinks = true
bFollowSymLinksOutSidePool = true
function findPath (sRequestedPath) {
sFoundPath = ""
foreach sPath in split(sRequestedPath, ':')
sPath = getRealPath(sPath, true)
sRealRequestedPath = getRealPath(sPath + '/' + sRequestedPath, bFollowSymlink);
if(is_file(sRealPath) && ( sFoundPath == "" )){
if(!bFollowSymLinks || bFollowSymLinksOutSidePool || fileResidesInPool (sRealPath)) return sRealRequestedPath
}
if( is_directory(sRealPath) ){
if(!bFollowSymLinks || bFollowSymLinksOutSidePool || fileResidesInPool (sRealPath)) sFoundPath = ( sFoundPath == "" ? "" : ":" ) + sRealPath
}
}
return sFoundPath
}
How exactly does it know what symlinks to fake as directories?
By setting an option: just true/false, a pattern /NAS/ND*/timeline (in my case) or even ugly things like extensions (so it's easy to detect).
Or do you use them out of the pool and need them to be original?
Jup, the pool is mounted at /NAS/timeline, which is aggregated from all /NAS/ND*/timeline-directories pointing to /NAS/ND*/timeline-directories.
Treating symlinks to things as the thing themselves means active inspection and rewriting of entry attr info to change it from a symlink to a directory or file and managing the complications with determining if it should be done as mentioned above.
Indeed, and I can imagine that costs a lot of effort.
I'm not sure you're understanding what I'm asking. The problem isn't finding a random symlink. The problem is managing the many different usecases at one time. A symlink is found. How exactly do you determine if this specific symlink should be faked as a directory vs others? symlinks are just hold values that if they actually resolve will be followed transparently as paths by certain calls. Not all. And people use symlinks for all kinds of purposes. What if a symlink points outside the pool? what if it points to another symlink? What if it points to a file? What happens if one of the symlinks across the drives don't point to a directory in the pool? There are a lot of situations.
You didn't answer my questions about why you can't just symlink into the pool or why transparent changing of links is better than rewrites or changing your links.
Jup, the pool is mounted at /NAS/timeline, which is aggregated from all /NAS/ND*/timeline-directories pointing to /NAS/ND*/timeline-directories.
I'm not clear. Do you or do you not use the symlinks outside the pool? What is preventing you from changing or creating new links that point into the pool?
I'm not sure you're understanding what I'm asking.
That might be the case, since it's not my native language.
The problem is managing the many different usecases at one time. A symlink is found. How exactly do you determine if this specific symlink should be faked as a directory vs others?
I would say, only the symlinks pointing to an existing directory (if it points to a file, it should be acting like it's a file -> it already is, and most applications can handle that; but there might be usecases (not mine) in which it might be handy if links are being handled like files themselves). The point in my usecase (and perhaps others) is that a symlink should aggregate/act like a directory, so transparantly should be handled like it is a directory. In my case the symlinks actually point outside the pool. Since the pool is mounted at /NAS/data (-> /NAS/ND*/data; current data state) and /NAS/timeline (-> /NAS/ND*/snapshots; earlier data states).
So I can imagine having a bunch of options: rewrite_symlinks=[no|inside|wide] ; don't rewrite them, only rewrite those pointing to another location in the pool, rewrite also those pointing outside the pool. symlinks_pattern = '/paths/in/which/to/rewrite:/path/b/*' ; might be added for safety reaseons, to restrict especially in the case
I'm not clear. Do you or do you not use the symlinks outside the pool? What is preventing you from changing or creating new links that point into the pool?
Well, that could be an option, but is quite complicated case at my side. Since I make all snapshots in /NAS/ND*/snapshots (also for back-up reasons, since you have a guaranteed frozen/consistent state of al files). Only the snapshots contributing to the timeline than should be residing in /NAS/ND*/timeline, and I have to rename them each time (Today > yesterday, yesterday > day before yesterday, this week > last week, last week > two weeks ago). That is quite a lot more complicated to guarantee the consistency, especially if you decide to make at a random time in between.
So this is my script now:
TimelineCreate () {
local iBestTimeDiff=0 iKeep=0 sGroup="" iI=0
while [[ -n "$1" ]]; do
case "$1" in
--timediff|-td)
[[ "$2" =~ ^([-+]?[0-9]*[yYdDmHhMSs]?)+$ ]] || error_exit "Timediff is invalid"
shift
local iT=0 iF=0
for (( iI=0 ; iI < ${#1}; iI++ )); do
case "${1:$iI:1}" in
[0-9])
let iT*=10
(( iT += ( iT < 0? -1 : 1 ) * ${1:$iI:1} ))
#echo ${1:$iI:1}
#echo $iT
[[ "${1:$(( $iI - 1 )):1}" == "-" ]] && let iT=-iT
;;
y|Y)
iF=31556952
;;
M)
iF=2629746
;;
d|D)
iF=86400
;;
H|h)
iF=3600
;;
m)
iF=60
;;
S|s)
iF=1
;;
esac
if (( iF > 0 )) ; then
(( iT )) || iT=1
let iBestTimeDiff+=$iT*$iF iF=0 iT=0
fi
done
#echo "$iT"
let iBestTimeDiff+=$iT
(( iBestTimeDiff > 0 )) || let iBestTimeDiff=-iBestTimeDiff
;;
--group|-g)
sGroup="$2"
shift
;;
--keep|-k)
[[ "$2" =~ ^[0-9]+$ ]] || error_exit "Number of backups to keep is invalid"
shift
iKeep="$1"
;;
--)
shift
break
;;
*)
(( iKeep > 0 || $# == 1 )) || let iKeep=$#-1
break;
esac
shift
done
TimelineTidy "$sGroup"
local sSSDate=$( date +"%y%m%d_%H%M%S" ) sMp
for sMp in "$sNASPath/ND"*
do
[[ ! -d "$sMp" ]] && continue
isBtrFSSubvolume "$sMp/data" || error_exit "$sMp is not a NAS-data mounpoint!"
local sSSName="$sMp/snapshots/$sSSDate" asSSs=()
[[ "$sGroup" == "" ]] || sSSName+=" $sGroup"
( btrfs subvolume snapshot -r "$sMp/data" "$sSSName" > /dev/null && echo "Snapshot created of $sMp" ) || error_exit "Snapshot creation failed!"
if (( iKeep > 0 )); then
local aiKeeps=($( seq 1 $(( iKeep - 1))))
mapfile -t asSSs <<< $( find -H "$sMp/snapshots/" -mount -maxdepth 1 -mindepth 1 -type d -regextype sed -regex ".*/[0-9]\{6\}_[0-9]\{6\}$( [ -z "$sGroup" ] || echo "\s$sGroup" )" | sort -r )
if (( iKeep > 1 )); then
if (( iBestTimeDiff > 0 )); then
local aiComb=("${aiKeeps[@]}") iFirstDate=0 aiSSDiffs=() aiDiffs=() iDiff=0 iMinDiff=-1
function interpretSSDate () {
date -d "$( echo "$1" | sed -e 's/.*\/\([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)_\([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{2\}\).*/20\1-\2-\3T\4:\5:\6/g' )" +%s
}
iFirstDate=$( interpretSSDate "${asSSs[0]}" )
for iI in "${!asSSs[@]}"; do
aiSSDiffs[$iI]=$(( iFirstDate - $( interpretSSDate "${asSSs[$iI]}" ) ))
done
while true ; do
iDiff=0 iI=0
for iI in "${!aiComb[@]}"; do
# Big numbers, therefore with dc
aiDiffs=($( dc -e "${aiSSDiffs[${aiComb[$iI]}]} $(( iI == 0? 0 : ${aiSSDiffs[${aiComb[$iI - 1]}]} )) - $iBestTimeDiff - 2 ^ $iDiff + p ${iMinDiff/-/_} - p" ))
iDiff=${aiDiffs[0]}
[[ "${aiDiffs[1]:0:1}" == "-" ]] || [[ "$iMinDiff" == "-1" ]] || break
done
aiDiffs=($( dc -e "$iDiff p ${iMinDiff/-/_} - p" ))
if [[ "$iMinDiff" == "-1" || "${aiDiffs[1]:0:1}" == "-" ]]; then
iMinDiff="$iDiff"
aiKeeps=("${aiComb[@]}")
fi
iI=$(( ${#aiComb[@]} - 1 ))
while (( ++aiComb[$iI] > ( ${#asSSs[@]} + $iI - ${#aiComb[@]} ) )); do
(( --iI < 0 )) && break 2
done
while (( ++iI < ${#aiComb[@]} )); do
aiComb[$iI]=$(( aiComb[iI - 1] + 1 ))
done
done
for (( iI = 1; iI < ${#asSSs[@]}; ++iI )); do
local iJ
for iJ in "${aiKeeps[@]}"; do
(( iI == iJ )) && continue 2
done
( btrfs subvolume delete -c "${asSSs[$iI]}" >> /dev/null && echo "Removed snapshot: ${asSSs[$iI]}" ) || echo "Couldn't remove snapshot ${asSSs[$iI]}"
done
fi
fi
fi
mapfile -t asSSs <<< $( find -H "$sMp/snapshots/" -mount -maxdepth 1 -mindepth 1 -type d -regextype sed -regex ".*/[0-9]\{6\}_[0-9]\{6\}$( [ -z "$sGroup" ] || echo "\s$sGroup" )" | sort -r )
local sLink="$sGroup" sFile=""
for (( iI=0 ; iI++ < ${#asSSs[@]} ; )); do
(( $iI <= $# )) && sLink="${!iI}"
sFile="$sMp/timeline/$sLink$( (( $iI > $# )) && echo " $(( $iI - $# ))" )"
[ -d "$sFile" ] && rm "$sFile"
( ln -fn "$sMp/snapshots/$( basename "${asSSs[$(( iI - 1 ))]}" )" "$sFile" > /dev/null && echo "Created symlink ${sFile##*/} in timeline!" ) || echo "Error: Couldn't create symlink $sLink!"
done
done
}
TimelineTidy () {
local sLink sPath iI
for sLink in "$sNASPath/ND"*"/timeline/"* ; do
[[ -L "$sLink" ]] || continue
sPath=$( readlink -f "$sLink" )
if [[ -d "$sPath" ]]; then
for (( iI=0 ; iI++ < $# ; iI++ )); do
[[ -z "$( basename "$sPath" | grep -e "^[0-9]\{6\}_[0-9]\{6\}$( [[ -z "${!iI}" ]] || echo " ${!iI}" )$" )" ]] || ( unlink "$sLink" && echo "Removed symlink ${sLink##*/}" ) || echo "Error: Couldn't remove ${sLink##*/}"
done
else
( unlink "$sLink" > /dev/null && echo "Removed symlink $sLink" ) || echo "Error: Couldn't remove $sLink"
fi
done
}
TimelineCreate -g "daily" -td D "Vandaag" "Gisteren" "Eergisteren"
TimelineCreate -g "weekly" -td 7D "This week" "Last Week"
TimelineCreate -g "monthly" -td M "This month" "Last Month"
In my usecase it is of course also an option, to mount a bunch of aggregated folders in /NAS/timeline. So, by the example script I wrote some comments back:
TimelineTidy () {
local sTimelinePoint sPath iI sMountedMFS asPaths
for sMountedMFS in "/NAS/timeline/"* ; do
mountpoint "$sMountedMFS" && umount "$sMountedMFS"
rm -r "$sMountedMFS"
done
declare -A asPaths
for sTimelinePoint in "/NAS/ND"*"/timeline/"* ;
[[ -L "$sTimeLinePoint" ]] || continue
sPath=$( readlink -f "$sTimelinePoint" )
if [[ -d "$sPath" ]]; then
for (( iI=0 ; iI++ < $# ; iI++ )); do
[[ -z "$( basename "$sPath" | grep -e "^[0-9]\{6\}_[0-9]\{6\}$( [[ -z "${!iI}" ]] || echo " ${!iI}" )$" )" ]] || ( unlink "$sTimelinePoint" && echo "Removed symlink ${sTimelinePoint##*/}" ) || echo "Error: Couldn't remove ${sTimelinePoint##*/}"
[[ -e "$sPath" ]] && asPaths[$( basename "$sTimeLinePoint" )]+=":$sPath"
done
else
( unlink "$sLink" > /dev/null && echo "Removed symlink ${sLink##*/}" ) || echo "Error: Couldn't remove ${sLink##*/}"
fi
done
for sTimeLinePoint in "${!asPaths[@]}" ; do
sMountedMFS="/NAS/timeline/$sTimeLinePoint"
mkdir "$sMountedMFS"
mergerfs -o allow_other,use_ino "${asPaths[$sTimeLinePoint]:1}" "$sMountedMFS"
done
}
Well, that could be an option, but is quite complicated case at my side.
I don't understand what is complicated. mergerfs isn't going to magically fix the race condition. There is nothing mergerfs can do to manage the fact that you could have symlinks pointing to different locations that aren't, from your perspective, aligned properly.
mergerfs can't do anything with regard to rewriting of symlinks that you can't already do in your script. Instead of setting symlink "/NAS/ND1/timeline/today" to "/NAS/ND1/snapshots/<date_time>" you set it to "/NAS/timeline/snapshots/<date_time>". You just have to keep the snapshot directories aligned name wise which from what you showed you do.
And that's no different from what symlink rewriting would be.
In fact that would ensure that the race condition doesn't manifest as a union of symlinked directories were merged because whatever symlink is returned only files from that time directory across all drives will be seen.
rewrite also those pointing outside the pool.
I don't understand what you mean. How would you rewrite links pointing outside the mount? Do you mean follow links? That's not the same thing.
rewriting a symlink means literally to change the content of the value returned back to the client software. It is a symlink and instead of value A on readlink it returns B.
following a symlink means not reporting back symlinks. It means transparently following them.
These are very different. rewriting is easy and in my opinion not very useful unless you can't control the value. following is more complicated in a number of ways and results in a different behavior from clients. And as with Samba... there would be some security concerns that need to be considered.
I don't understand what is complicated.
I indeed also got the feeling, we're talking past each other at multiple points. Actually, kind of amusing. Since I was thinking my comprehension of the English language wasn't that bad at all, but apparently I have to reconsider that opinion ;).
mergerfs isn't going to magically fix the race condition.
Indeed, mergerFS actually always had those race-conditions. Consider the example of having a directory in one path of the pool, and a file with the same name/subpath in the other path of the pool. I actually don't see what symlinks would make that different.
mergerfs can't do anything with regard to rewriting of symlinks that you can't already do in your script. Instead of setting symlink "/NAS/ND1/timeline/today" to "/NAS/ND1/snapshots/<date_time>" you set it to "/NAS/timeline/snapshots/<date_time>". You just have to keep the snapshot directories aligned name wise which from what you showed you do.
That could be an option at first sight, I agree (if you only use relative symlinks, since absolute symlinks will not point to the aggregated/merged directory e.g. '/NAS/ND1/timeline/snapshots/<date_time> daily' instead of '/NAS/timeline/snapshots/<date_time> daily' in this case). But under some circumstances the <date_time> may differ on the different disks. Therefore your solution probably will work most of the days, but not every day.
I don't understand what you mean. How would you rewrite links pointing outside the mount? Do you mean follow links? That's not the same thing.
Ok, that's one of the issues of talking past each other. Reading your explanation, I meant following the symlinks transparantly. And handle the path it refers to in case of a directory, exactly the same way when it was a 'hard link' to the directory (which doesn't exist).
rewriting is easy and in my opinion not very useful unless you can't control the value.
That's something I totally agree on. Especially if it only means changing it from an absolute to a relative path.
And as with Samba... there would be some security concerns that need to be considered.
Yep, I agree. That's really a point of consideration, and I'm aware of that. Although, it is not that big issue since it's on the local LAN (only used by my wife and me) behind a firewalled router. And the Samba needs a password before using it. But especially if you generalize the behaviour I wish for to all usecases, I can image options for restricting the transparently following of these symlinks only in certain paths of the pool and/or to which extent may be pointed (for example only within pool, only same file system, everywhere except system filesystem, everywhere, restriction by patterns).
Re-reading this stub and your reaction, I'm getting the feeling we're getting a closer understanding. But if you don't feel the same way, feel free to say since I'm open to consider other options to discuss this over. Since I think, it would be an (unique) option with additive value for the mergerFS.
That could be an option at first sight, I agree (if you only use relative symlinks, since absolute symlinks will not point to the aggregated/merged directory e.g. '/NAS/ND1/timeline/snapshots/<date_time> daily' instead of '/NAS/timeline/snapshots/<date_time> daily' in this case).
Either we're talking past each other or you don't quite understand how it would work.
You can use absolute paths. You'd set all of them to /NAS/timeline/snapshots/<date_time> daily. When the policy runs a single symlink is chosen and followed. It points to a directory within the pool. When the readdir occurs on the path you will get the union of /NAS/ND*/timeline/snapshots/<date_time> daily. Just as if you went to that path yourself within the pool.
But under some circumstances the <date_time> may differ on the different disks. Therefore your solution probably will work most of the days, but not every day.
Well then you're talking about creating a union of non-overlapping things then and that's a separate concept. An alternative is that you could just as well bindmount rather than symlink these paths and it'd work fine without changes to mergerfs.
Ok, that's one of the issues of talking past each other. Reading your explanation, I meant following the symlinks transparantly. And handle the path it refers to in case of a directory, exactly the same way when it was a 'hard link' to the directory (which doesn't exist).
OK. I was using it as used in Samba and other places.
That's something I totally agree on. Especially if it only means changing it from an absolute to a relative path.
I wasn't referring to changing absolute to relative but base path substitution. Replacing known branch values with the mount point.
Re-reading this stub and your reaction, I'm getting the feeling we're getting a closer understanding. But if you don't feel the same way, feel free to say since I'm open to consider other options to discuss this over. Since I think, it would be an (unique) option with additive value for the mergerFS.
There are other problems with symlink following. Creating symlinks I think will be broken. The reason it's fine in Samba is because the concept of symlinks doesn't exist to begin with withing the setup. But when working with FUSE I'm not sure I can work around the situation where a symlink is requested but a directory is created. The kernel considers that an error. It's the same problem as with symlinkify feature. Maybe there is a work around but I need to investigate. There is also the fact that directories and symlinks are different when considering other syscalls. Trying to properly handle every one of those may not be so easy.
It's really quirky but some of it might work. I have to very carefully manage the type so as not to accidentally give the kernel the wrong file type otherwise it'll error. And need to refactor some core code so as to be able to immediately flush the information in the kernel once given to it. Otherwise caching would lead to weird behavior. And you can't just return a directory when creating a symlink. The kernel will error. There are other problems too. How does rmdir work? It can't be atomic. If it follows the link it will have to rmdir the linked directories and then try to unlink the symlink. If any of the rmdir's fail due to not being empty it'd be in a partial state. Creates would also be weird because what is pointed at may not be the same filesystem as the branch checked.
Have you considered using bind mounts rather than symlinks? Or are you not running as a privileged user?
If not I'll see what I can get working. My fear is that there will be a number of edge cases and I'll need to audit every function. It'd possibly be a pretty big change.
An alternative is that you could just as well bindmount rather than symlink these paths and it'd work fine without changes to mergerfs.
Have you considered using bind mounts rather than symlinks? Or are you not running as a privileged user?
Yeah, that option also crossed my mind. Or the easier one in this case would be to mount those directories themselves, since al these timeline-directories are BtrFS-subvolumes. But then the next problem will be the persistence after a reboot. So, then I have to keep some record somewhere. In the end the whole problem is, that you just cannot have more than one hard link to a directory. In each case within the filesystems I know so far.
The reason it's fine in Samba is because the concept of symlinks doesn't exist to begin with withing the setup.
What do you mean? In the default configuration, Samba indeed doesn't support symlinks. But you have some options for them, even to follow symlinks outside the shared directory: https://www.samba.org/samba/docs/using_samba/ch08.html
Creating symlinks I think will be broken.
Yup, that's life: people should make sure their files are consistently correct.
It's really quirky but some of it might work. I have to very carefully manage the type so as not to accidentally give the kernel the wrong file type otherwise it'll error. (...). Creates would also be weird because what is pointed at may not be the same filesystem as the branch checked.
So after all, it's not that easy as it looked like to me at first sight ;). But I really think this would be a unique and interesting feature, but has to be really thoroughly thinked through before the implementation.
My first thoughts would be: creating files and so on, should be working transparently. But when it comes to the point of removing the directory itself, I would say it should be interpreted like 'unlink the symlinks' (and not deleting the original directory). And if the original directory is being deleted, than that's also the end of the consistency of the file system because the symlink will be broken by that action.
If not I'll see what I can get working. My fear is that there will be a number of edge cases and I'll need to audit every function. It'd possibly be a pretty big change.
Think and rethink it over. Especially, whether you also see the additive value of this option. Since you're the boss and also the one who's gonna get back any complaints concerning bugs/issues. I already have a function 'workaround' for the time being (I now read al paths from the symlinks, and than mount them joined together in '/NAS/timeline' using your mergerFS).
In the end the whole problem is, that you just cannot have more than one hard link to a directory. In each case within the filesystems I know so far.
Yeah. There were complications with links to directories back in the early days of Unix/POSIX and I believe it was dropped mostly. I think Apple's TimeMachine uses a filesystem that has directory links but no other's as far as I know.
rmdir
Removing just the symlink seems reasonable.
Think and rethink it over. Especially, whether you also see the additive value of this option. Since you're the boss and also the one who's gonna get back any complaints concerning bugs/issues. I already have a function 'workaround' for the time being (I now read al paths from the symlinks, and than mount them joined together in '/NAS/timeline' using your mergerFS).
Last night I was able to get a very early prototype working. Had to refactor some low level code because of how libfuse works (it's not the best API). You can create symlinks but they immediately turn to whatever is appropriate. I make sure to pass back to the kernel it's a symlink but when using this feature it'd set the caching to 0 seconds to force it to check it again. Then it'll be reported as whatever it's supposed to be. Still need to add rmdir and look over all the other functions to see how to properly handle them. I'm switched focus to inode calculation due to a few NFS issues people have reported but I'll get back to the follow symlink thing soon.
BTW... mergerfs has a runtime config interface and you can keep a mount around and have a script that just sets the branches on the fly.
You still interested in this feature? If you're able to build and test yourself I've got a branch with a first draft. https://github.com/trapexit/mergerfs/tree/follow-symlinks
I'm not going to include it in 2.30.0 but once you can sign off on the behavior I'll include it. Not sure if there is anything off or missing for your usecase.
Of course I'm still very interested in this feature. One of the few downsides of spending holidays elsewhere is that I didn't have access to my system, and I will be working again next week with probably a lot of work piled up. So, I hope I'll be able to test it at the end of next week. Besides, what means the option 'regular', in order to be able to test the different options listed there?
They type of file it points to. There are a number of types of files. Directory, regular, symlink, fifo, char device, block device, socket, etc.
So if you set it to directory it only follows symlinks to directories. Regular to regulars.
https://www.gnu.org/software/libc/manual/html_node/Testing-File-Type.html
Did you find the time to check it out?
I'm not sure if I'm seeing a variation of this issue, but here is my use case that didn't work as I expected - involving symlinks.
I have a system with multiple drives, simply because the drives aren't big enough to hold everything. I'm creating a folder with a bunch of symlinks, to present a merged view of the content.
But, then I have a secondary level of things I wanted to merge - say, on folder named "validated" and one folder named "unvalidated". I was using mergerFS to join the contents of those two folders.
So, it looks like:
/partialMerge/Validated/Content/A /partialMerge/Unvalidated/Content/A
The path /partialMerge/Validated is a symlink to disk1 The path /partialMerge/Unvalidated is a symlink to disk2
However, due to space constraints, /partialMerge/Validated/Content/A is a symlink to disk3
I use mergerfs in fstab: /partialMerge/Validated/Content:/partialMerge/Unvalidated/Content /mnt/FullMerge/Content fuse.mergerfs ro,allow_other,use_ino,cache.files=off,dropcacheonclose=true 0 0
When I then view /mnt/FullMerge/Content/A, I only see files from /partialMerge/Validated/Content/A
I do not see any files from /partialMerge/Unvalidated/Content/A
I'm using mergerfs version: 2.28.1 FUSE library version: 2.9.7-mergerfs_2.28.0 fusermount version: 2.9.9 using FUSE kernel interface version 7.29
I believe the rest of my config is correct, because other subfolders merge correctly, the difference being, they don't have the extra symlink.
A symlink is a specific reference to something. You can't make a union of a symlink any more than you can create a union of two separate files. You can select a specific symlink like you can select a specific file. You would need to follow links (to directories) to merge them. Otherwise just use bind mounts which in your case seems more appropriate. If you blanketly follow symlinks it can confuse certain software that use symlinks or as mentioned in the "follow link" feature in Samba can lead to security issues.
Is there an option for following links to directories for mergerfs?
Otherwise, yea, I guess I'll convert it to a bind mount.
Just was unaware I even had the issue until I noticed something missing yesterday, and couldn't figure out why.
No, there isn't. That's what this feature request is about. However, if you have a fully qualified symlink in the same relative path to the directory it should follow it (opendir will follow the link). It doesn't explicitly check for that situation. It could lead to some odd issues though. Particularly if your layout changes and getattr ends up picking the symlink branch rather than the directory branch.