Proposal: an option for union mount of symlinks of directories or recursive archives
Currently, given the following union mount layout:
- source1/
- outer/
- inner1/
- file1
- inner -> inner1
- inner1/
- outer/
- source2/
- outer/
- inner2/
- file2
- inner -> inner2
- inner2/
- outer/
If we do a union bind mount:
ratarmount source1 source2 merged
merged/outer/inner will be a symbolic link to inner2, therefore it contains file2 but not file1. On the other hand, currently archive files will become recursive union mount if they are not symbolic links, which is inconsistant with the symbolic link behavior.
It would be useful if there is an option to merge and resolve symbolic links, so that merged/outer/inner becomes a regular directory that contains both file1 and file2.
I am willing to work out a more complete proposal about the exact behavior and contribute an implementation if it's a good idea.
Conceptually, recursive union mount of archives should be a more powerful feature than recursive union bind mount of symlinks. Maybe a better behavior is to imply the recursive union bind mount of symlinks when --recursive flag is present.
This can be done by introducing a new experimental implementation of FileVersionLayer to handle both hard links and symbolic links.
Your example makes sense, but having different versions of symbolic links also makes sense. I find it difficult to decide between one of the two.
Conceptually, recursive union mount of archives should be a more powerful feature than recursive union bind mount of symlinks. Maybe a better behavior is to imply the recursive union bind mount of symlinks when
--recursiveflag is present.
This semantic overloading feels wrong.
I think what would make sense to generically offer is some kind of --resolve-symbolic-links option that should make all symlinks behave as if they were files or folders. This might be doable similarly to the FileVersionLayer, but of course in a different MountSource-derived class. Maybe this is what you meant with your last comment. If this symlink-resolving is done before the union mounting, your desired effect should be achieved. I also feel like symbolic link resolution has been requested previously, but I can't remember exactly.
There are some edge cases preventing relative links from being resolved in UnionMountSource. I think we will need to change FileVersionLayer for handling these edge cases
On Mon, Jun 30, 2025 at 10:56 Maximilian Knespel @.***> wrote:
mxmlnkn left a comment (mxmlnkn/ratarmount#160) https://github.com/mxmlnkn/ratarmount/issues/160#issuecomment-3020190243
Your example makes sense, but having different versions of symbolic links also makes sense. I find it difficult to decide between one of the two.
Conceptually, recursive union mount of archives should be a more powerful feature than recursive union bind mount of symlinks. Maybe a better behavior is to imply the recursive union bind mount of symlinks when --recursive flag is present.
This semantic overloading feels wrong.
I think what would make sense to generically offer is some kind of --resolve-symbolic-links option that should make all symlinks behave as if they were files or folders. This might be doable similarly to the FileVersionLayer, but of course in a different MountSource-derived class. Maybe this is what you meant with your last comment. If this symlink-resolving is done before the union mounting, your desired effect should be achieved. I also feel like symbolic link resolution has been requested previously, but I can't remember exactly.
— Reply to this email directly, view it on GitHub https://github.com/mxmlnkn/ratarmount/issues/160#issuecomment-3020190243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAES3ORYN5OITVEHS2TWI5L3GF23XAVCNFSM6AAAAACAO3AG4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAMRQGE4TAMRUGM . You are receiving this because you authored the thread.Message ID: @.***>
Given the following directory layout
- branch1/
- subdir0 -> ./subdir1
- subdir1/
- subdir2/
- file1
- subdir2/
- branch2/
- subdir0/
- subdir2/
- file2
- subdir2/
- subdir1/
- subdir2/
- file3
- subdir2/
- subdir0/
If we do a union bind mount with the proposing --resolve-symbolic-links flag
ratarmount --resolve-symbolic-links branch1 branch2 merged
Then we do
ls merged/subdir0/subdir2
What would be the expected behavior?
I think when resolving merged/subdir0, it would be useful if the symbolic link subdir0 -> ./subdir1 is resolved to merged/subdir1 instead of branch1/subdir1, because symblic links are supposed to be late binding.
Therefore, merged/subdir0/subdir2 should include file1 and file3 from merged/subdir1/subdir2 (resolved branch1/subdir0/subdir2) and file2 from branch2/subdir0/subdir2. That is to say, ls merged/subdir0/subdir2 should show all three files file1 / file2 / file3.
@mxmlnkn Do you think the behavior makes sense? I am willing to implement such behavior in my pull request if you agree.
I am trying to implement union mount of symbolic links as a layer between AutoMountLayer and FileVersionLayer.
However, I cannot distinguish versions from TAR file and versions from branches of UnionMountSource, because UnionMountSource will treat overlapped files as versions, too.
In my use case, union mount of old versions of symbolic lnks from TAR is not a problem, but it would be a supprising behavior if a user expect that only the newest version from a TAR takes effect.
@mxmlnkn Do you think if I could introduce a flag --union-mount-symblic-link-versions to enable such behavior, so that the behavior would be explicit?
An alternative solution would be to change the interface of MountSource to make fileVersion two-dimensional, dimension 1 is the TAR file version and dimension 2 is the branch in union mount.
What would be the expected behavior?
These are the difficult question I initially feared you were asking in your original post. Well, now we have reached this point. The question is somewhat equivalent to asking whether to resolve the symlink before or after merging. Currently, I think the effect is as if the symbolic links were resolved as the last step. This also explains some effects in #164. I think, some effect in #164 is also explained by allowing a file to exist as a regular and a directory at the same time and if tried to be accessed as a directory, it will descend into it, even if that was not the currently requested version. "Fixing" / changing that behavior might also fix some of the inconsistencies, but it might slow things down because in effect one would always have to check the file type of all parents to be a valid directory. I would not like that very much, but maybe it could be guarded behind some fast-path boolean similar to a cache.
Did you check what other existing solutions do in such cases? We don't have to reinvent the wheel. Mergefs should have similar decidability problems.
And the decidability whether a merged view should overwrite folders with the new contents or merged contents also might be a decidability problem. Although, looking at incremental backups, I think my implicit decision was to merge recursively.
However, I cannot distinguish versions from TAR file and versions from branches of
UnionMountSource, becauseUnionMountSourcewill treat overlapped files as versions, too.
That sounds like the correct behavior to me. Why would you want to change this? I may want to inspect the unmerged original folder of either merge source.
I'm always hesitant to introduce new flags, as they make everything complicated and probably are hard to understand and find. The 2D version also does not convince me, but maybe because the above problem description did not convince me.
On Sun, Jul 6, 2025 at 00:22 Maximilian Knespel @.***> wrote:
mxmlnkn left a comment (mxmlnkn/ratarmount#160) https://github.com/mxmlnkn/ratarmount/issues/160#issuecomment-3041081036
What would be the expected behavior?
These are the difficult question I initially feared you were asking in your original post. Well, now we have reached this point. The question is somewhat equivalent to asking whether to resolve the symlink before or after merging. Currently, I think the effect is as if the symbolic links were resolved as the last step. This also explains some effects in #164 https://github.com/mxmlnkn/ratarmount/issues/164. I think, some effect in #164 https://github.com/mxmlnkn/ratarmount/issues/164 is also explained by allowing a file to exist as a regular and a directory at the same time and if tried to be accessed as a directory, it will descend into it, even if that was not the currently requested version. "Fixing" / changing that behavior might also fix some of the inconsistencies, but it might slow things down because in effect one would always have to check the file type of all parents to be a valid directory. I would not like that very much, but maybe it could be guarded behind some fast-path boolean similar to a cache.
Do you think a solution to always resolve symbolic links as later binding makes sense to you, i.e. making #164 case always lists the three files?
Did you check what other existing solutions do in such cases? We don't have
to reinvent the wheel. Mergefs https://github.com/trapexit/mergerfs should have similar decidability problems.
And the decidability whether a merged view should overwrite folders with the new contents or merged contents also might be a decidability problem. Although, looking at incremental backups, I think my implicit decision was to merge recursively.
However, I cannot distinguish versions from TAR file and versions from branches of UnionMountSource, because UnionMountSource will treat overlapped files as versions, too.
That sounds like the correct behavior to me. Why would you want to change this? I may want to inspect the unmerged original folder of either merge source.
I'm always hesitant to introduce new flags, as they make everything complicated and probably are hard to understand and find. The 2D version also does not convince me, but maybe because the above problem description did not convince me.
— Reply to this email directly, view it on GitHub https://github.com/mxmlnkn/ratarmount/issues/160#issuecomment-3041081036, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAES3ORZBTJOQC6WCLGNUB33HDFDZAVCNFSM6AAAAACAO3AG4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANBRGA4DCMBTGY . You are receiving this because you authored the thread.Message ID: @.***>
Do you think a solution to always resolve symbolic links as later binding makes sense to you, i.e. making #164 case always lists the three files?
yes
On Sun, Jul 6, 2025 at 00:22 Maximilian Knespel @.***> wrote:
mxmlnkn left a comment (mxmlnkn/ratarmount#160) https://github.com/mxmlnkn/ratarmount/issues/160#issuecomment-3041081036
What would be the expected behavior?
These are the difficult question I initially feared you were asking in your original post. Well, now we have reached this point. The question is somewhat equivalent to asking whether to resolve the symlink before or after merging. Currently, I think the effect is as if the symbolic links were resolved as the last step. This also explains some effects in #164 https://github.com/mxmlnkn/ratarmount/issues/164. I think, some effect in #164 https://github.com/mxmlnkn/ratarmount/issues/164 is also explained by allowing a file to exist as a regular and a directory at the same time and if tried to be accessed as a directory, it will descend into it, even if that was not the currently requested version. "Fixing" / changing that behavior might also fix some of the inconsistencies, but it might slow things down because in effect one would always have to check the file type of all parents to be a valid directory. I would not like that very much, but maybe it could be guarded behind some fast-path boolean similar to a cache.
Did you check what other existing solutions do in such cases? We don't have to reinvent the wheel. Mergefs https://github.com/trapexit/mergerfs should have similar decidability problems.
And the decidability whether a merged view should overwrite folders with the new contents or merged contents also might be a decidability problem. Although, looking at incremental backups, I think my implicit decision was to merge recursively.
However, I cannot distinguish versions from TAR file and versions from branches of UnionMountSource, because UnionMountSource will treat overlapped files as versions, too.
That sounds like the correct behavior to me. Why would you want to change this? I may want to inspect the unmerged original folder of either merge source.
In terms of inspecting the unmerged original folders, currently list and list_mode do not accept a fileVersion argument, so how would it possible?
I'm always hesitant to introduce new flags, as they make everything
complicated and probably are hard to understand and find. The 2D version also does not convince me, but maybe because the above problem description did not convince me.
— Reply to this email directly, view it on GitHub https://github.com/mxmlnkn/ratarmount/issues/160#issuecomment-3041081036, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAES3ORZBTJOQC6WCLGNUB33HDFDZAVCNFSM6AAAAACAO3AG4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANBRGA4DCMBTGY . You are receiving this because you authored the thread.Message ID: @.***>
Ah right. It's not possible. I guess, ideally, the list methods would also take a file info object, but oh well. I'm not sure how used the file versions actually are. I don't think I have ever received an issue on it and I'll probably disable it by default in the next release to save yet another indirection.
I think the reason why folders do not supporting versioning is that folders in the tar format is implicitly created. There’s no such concept of folder versions in tar. The current UnionMountSource’s behavior is to merge folders into a single folder and to merge file version lists into a flatten file version list. The behavior makes sense to me. The only issue is that symbolic links of folders are versioned, unlike tar’s implicitly created folders. So if I do a union mount of symbolic links, I would like to mount all versions, not only version 0.
If a user don’t want to bring older symbolic links of folders into the union, they can apply FileVersionLayer on top of the tar format before passing to UnionMountSource, in order to rename older versions to special file names, so that UnionMountSource can only see the version 0 at the original file name. This approach helps distinguishing branches and versions without introducing two-dimensional versions.
Do you think this approach makes sense if I make a PR to let UnionMountSource always merge all versions of symbolic links of folders?
On Sun, Jul 6, 2025 at 09:48 Maximilian Knespel @.***> wrote:
mxmlnkn left a comment (mxmlnkn/ratarmount#160) https://github.com/mxmlnkn/ratarmount/issues/160#issuecomment-3042200412
Ah right. It's not possible. I guess, ideally, the list methods would also take a file info object, but oh well. I'm not sure how used the file versions actually are. I don't think I have ever received an issue on it and I'll probably disable it by default in the next release to save yet another indirection.
— Reply to this email directly, view it on GitHub https://github.com/mxmlnkn/ratarmount/issues/160#issuecomment-3042200412, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAES3OQXV6P7MB5ZGF5IWU33HFHMXAVCNFSM6AAAAACAO3AG4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANBSGIYDANBRGI . You are receiving this because you authored the thread.Message ID: @.***>
I think the reason why folders do not supporting versioning is that folders in the tar format is implicitly created. There’s no such concept of folder versions in tar. The current UnionMountSource’s behavior is to merge folders into a single folder and to merge file version lists into a flatten file version list. The behavior makes sense to me.
I would have agreed, if not for my tests below. Note that TAR is a bit underspecified in that part. Files can appear in the TAR without any record of the parent folder. But there are also entries for directories. One could try to create a scheme where every directory entry starts a new version of that directory. I doubted that anyone did implement it like this, but it seems I was wrong, and this is indeed the correct way to think about it. Here is the test with GNU tar's --occurrence.
rm -rf foo; echo bar > foo
tar --numeric-owner -cf foo.tar foo
rm foo; mkdir foo; echo bar1 > foo/bar1
tar --numeric-owner --append -f foo.tar foo
'rm' -r foo; ln -s foo2 foo
tar --numeric-owner --append -f foo.tar foo
unlink foo; mkdir foo; echo bar2 > foo/bar2
tar --numeric-owner --append -f foo.tar foo
tar tvlf foo.tar
# -rwx------ 1000/1000 4 2025-07-09 21:50 foo
# drwx------ 1000/1000 0 2025-07-09 21:50 foo/
# -rwx------ 1000/1000 5 2025-07-09 21:50 foo/bar1
# lrwxrwxrwx 1000/1000 0 2025-07-09 21:50 foo -> foo2
# drwx------ 1000/1000 0 2025-07-09 21:50 foo/
# -rwx------ 1000/1000 5 2025-07-09 21:50 foo/bar2
tar --occurrence=1 -tvlf foo.tar foo
# -rwx------ 1000/1000 4 2025-07-09 21:50 foo
tar --occurrence=2 -tvlf foo.tar foo
# drwx------ 1000/1000 0 2025-07-09 21:50 foo/
# -rwx------ 1000/1000 5 2025-07-09 21:50 foo/bar1
tar --occurrence=3 -tvlf foo.tar foo
# lrwxrwxrwx 1000/1000 0 2025-07-09 21:50 foo -> foo2
tar --occurrence=4 -tvlf foo.tar foo
# drwx------ 1000/1000 0 2025-07-09 21:50 foo/
# -rwx------ 1000/1000 5 2025-07-09 21:50 foo/bar2
This means that the folders in the TAR being merged instead of being versioned could be classified as a bug in ratarmount. However, when extracting the tar, what ratarmount tries to simulate, the effect would be similar to the folders being merged. At least that would be the case if the TAR only contained folders. So the default view is correct, but not the versioning implementation. If the TAR contains the same path as a file and a folder, then GNU tar will exit with:
tar -xf foo.tar
tar: foo: Cannot create symlink to ‘foo2’: File exists
tar: Exiting with failure status due to previous errors
It seems that extracting the file and overwriting it as a folder worked, but the folder will not be deleted when that path / file name suddenly reappears as a symbolic link.
But, I think the versioning is probably unused by most users.
The only issue is that symbolic links of folders are versioned, unlike tar’s implicitly created folders. So if I do a union mount of symbolic links, I would like to mount all versions, not only version 0.
I am not sure about this. This is a very hard decidability problem as you have already found out. The decision and looking at other solutions, like for example GNU tar as I did above, is probably as much work as implementing it. That's why my answer takes longer and longer. I simply do not know how to best implement it... I have also pointed to mergerfs, did you check how it behaves?
If a user don’t want to bring older symbolic links of folders into the union, they can apply FileVersionLayer on top of the tar format before passing to UnionMountSource, in order to rename older versions to special file names, so that UnionMountSource can only see the version 0 at the original file name. This approach helps distinguishing branches and versions without introducing two-dimensional versions.
FileVersionLayer should always be the top-most layer. At least that's how it was intended to be used. Anything else, or even multiple FileVersionLayer's will probably lead to subtle bugs because it was not intended for that use case.
I think some of the decidability problems could be given over to the user by deciding at which point the symbolic links should be resolved, see the earlier idea about such a link resolver layer. This could be done before or after the merging. However, even though this would make it easy and clear when using the ratarmoutncore library, the order for this is currently not exposed to the command line interface. A command line interface that basically allows for such arbitrary MountSource stacking orders and options etc., would really become something akin to the ffmpeg CLI or ImageMagic CLI. I guess at some point this is unavoidable, but still hard to do and teach, and implement (with argparse).
Do you think this approach makes sense if I make a PR to let UnionMountSource always merge all versions of symbolic links of folders? …
But how does UnionMountSource know / decide that the symbolic link points to a folder? Both, the symbolic link / file / folder and the target of the link can have different versions and from different merged sources... You run into the same decidability problems when the link target has multiple versions, existing as a file and a folder and symbolic links might point into the source itself or another union-mounted source, which also has different target versions... It's an outright puzzle. Listing all cases as readable tests and deciding on them would already be a lot of work.
You've raised some very complex but important points. Let me try to break down how I think we can move forward.
But how does UnionMountSource know / decide that the symbolic link points to a folder? Both, the symbolic link / file / folder and the target of the link can have different versions and from different merged sources... You run into the same decidability problems when the link target has multiple versions, existing as a file and a folder and symbolic links might point into the source itself or another union-mounted source, which also has different target versions... It's an outright puzzle. Listing all cases as readable tests and deciding on them would already be a lot of work.
I think the "late binding" approach we discussed is the key to solving this puzzle in a consistent way. The behavior is deterministic: when a path is accessed, we resolve symlinks at that moment, within the context of the merged view.
For example, when resolving merged/subdir0, the link subdir0 -> ./subdir1 becomes a lookup for merged/subdir1. UnionMountSource doesn't need to know upfront whether a symlink points to a file or a folder. The resolution happens dynamically during the lookup process, and it will correctly union the contents if the final resolved path is a directory in multiple branches. I believe this logic, which is implemented in the PR, handles the decidability problem cleanly.
I have also pointed to mergerfs, did you check how it behaves?
I did, but its model for handling these cases is different and doesn't quite fit the late-binding behavior that I think would be most useful here.
FileVersionLayer should always be the top-most layer. At least that's how it was intended to be used. Anything else, or even multiple FileVersionLayer's will probably lead to subtle bugs because it was not intended for that use case.
I agree that we should avoid adding flags unnecessarily. You raised a good point about FileVersionLayer. Here's a thought: what if we take a step-by-step approach?
For now, my PR lets UnionMountSource merge all versions of folder-like symlinks it sees. This provides the core functionality. We can hold off on adding a new flag to control the FileVersionLayer's position or to filter out older versions.
If, in the future, users find that older symlink versions are being surprisingly included in the union mount, we can address that specific complaint by introducing a flag to control this behavior. Since nobody is using this feature yet, we'd just be guessing what users might prefer. This keeps the CLI clean for now and lets real-world usage guide future additions.
The point about tar folder versioning being a bug in ratarmount is interesting, but perhaps it can be handled separately from resolving the symlink behavior in this proposal?
So, to summarize my proposal: let's proceed with the PR by implementing the deterministic late-binding for symlinks, and have UnionMountSource merge all versions of those symlinks by default. We can defer the decision on new flags until we get user feedback.
Does this sound like a reasonable path forward to you?
For example, when resolving
merged/subdir0, the linksubdir0 -> ./subdir1becomes a lookup formerged/subdir1.UnionMountSourcedoesn't need to know upfront whether a symlink points to a file or a folder. The resolution happens dynamically during the lookup process, and it will correctly union the contents if the final resolved path is a directory in multiple branches. I believe this logic, which is implemented in the PR, handles the decidability problem cleanly.
It doesn't feel modular and clean to have this in UnionMountSource, but I think I get it and am getting convinced that it has to be there, unfortunately, probably simply guarded by flag in the constructor like follow/resolve_symlinks(_for_folders).
For now, my PR lets
UnionMountSourcemerge all versions of folder-like symlinks it sees. This provides the core functionality. We can hold off on adding a new flag to control theFileVersionLayer's position or to filter out older versions.If, in the future, users find that older symlink versions are being surprisingly included in the union mount, we can address that specific complaint by introducing a flag to control this behavior. Since nobody is using this feature yet, we'd just be guessing what users might prefer. This keeps the CLI clean for now and lets real-world usage guide future additions.
Sounds reasonably. Which PR? #163 does not touch UnionMountSource and instead implements a separate link-resolving mount source.
The point about
tarfolder versioning being a bug in ratarmount is interesting, but perhaps it can be handled separately from resolving the symlink behavior in this proposal?
Of course. It doesn't have high priority to me.
So, to summarize my proposal: let's proceed with the PR by implementing the deterministic late-binding for symlinks, and have
UnionMountSourcemerge all versions of those symlinks by default. We can defer the decision on new flags until we get user feedback.Does this sound like a reasonable path forward to you?
Yes.
Sounds reasonably. Which PR? https://github.com/mxmlnkn/ratarmount/pull/163 does not touch UnionMountSource and instead implements a separate link-resolving mount source.
Right, for now #163 does not touch UnionMountSource but it also includes a union mount implementation. I will merge it with UnionMountSource to remove the duplicated functionality in a new revision of #163.
I also feel like symbolic link resolution has been requested previously, but I can't remember exactly.
I found the commit from over a year ago by chance on the stalled branch: b4e8925ca884f991bf88d37ea77cebbee9932c0a It was intended for #102.
I just updated #163. The caching strategy in UnionMountSource does not apply to the late bound symblic links, so I extracted the common code from union.py to multi.py and kept different caching implmentations.