Feature request: option to follow symlinks
I have a directory tree structured like the following:
/
├── collections
│ └── training1
│ ├── part1 -> /downloads/source1
│ ├── part2.tar.gz -> /downloads/source2/file1.tar.gz
│ └── part3.zip -> /downloads/source2/file2.zip
├── downloads
│ ├── source1
│ │ ├── file1.tar.gz -> /datastore/c2008e0b
│ │ └── file2.tar.gz -> /datastore/2abf99d6
│ └── source2
│ ├── file1.tar.gz -> /datastore/a126a8d0
│ └── file2.zip -> /datastore/b5a999dd
└── datastore
├── 2abf99d6
├── a126a8d0
├── b5a999dd
└── c2008e0b
I'd like to mount a view of /collections where each compressed file is expanded by ratarmount, recursing on directory symlinks. The naive approach with ratarmount -lr /collections /collections_ratarmounted doesn't work:
collections_ratarmounted
└── training1
├── part1 -> /downloads/source1 # still just a symlink to a non-ratarmounted directory
├── part2.tar.gz # still just a symlink to a non-ratarmounted file
└── part3.zip # expanded, as desired
├── contents_1.jpg
└── ...
Interestingly, it seems that ratarmount is able to follow symlinks, but only some of the time[^errata-file-types].
One could imagine trying to fix this by also ratarmounting /downloads[^errata-archives]. That gives us /downloads_ratarmounted where each archive is correctly expanded, but now we have a new problem: we need to update our view of /collections so that any symlinks into /downloads are transformed to symlinks into /downloads_ratarmounted. I don't see a good way to do this, short of some kind of hacky relative link shenanigans (and then I'm stuck using only relative symlinks, which is a constraint I'd rather not impose if possible because it makes reorganizing files quite error-prone).
Alternatively, one might think to try ratarmounting /downloads onto itself. I have no idea if this is even possible, but if it is, it would mean that applications lose the ability to access the tar files themselves (e.g. to checksum them), which I think would cause problems for my use case.
The simplest solution I can think of is to have some step (either within ratarmount or as an additional FUSE layer before ratarmount) that "flattens" the symlinks into plain old files and directories, so that ratarmount can follow them and expand the target (or the contents of the target directory). I looked around briefly for some existing FUSE application that could flatten symlinks, but I didn't see anything off-the-shelf for it.
Is this use case reasonable enough to consider adding a "follow symlinks" option to ratarmount itself? Or is there another approach I'm missing (or a problem I've identified above that actually has an easy fix)?
[^errata-file-types]: An earlier version of this feature request incorrectly claimed that ratarmount could always follow symlinks to files, but further testing showed that this was not always the case. I'm still not entirely sure how I managed to produce one invocation where a tar.gz file symlink wasn't expanded.
[^errata-archives]: An earlier version of the example in this feature request called this directory /archives instead of /downloads, which introduced unnecessary terminology confusion.
Commands to reproduce that folder hierarchy:
mkdir issue-102
cd issue-102
mkdir -p collections/training1 downloads/source{1,2} datastore
cd datastore
for hash in 2abf99d6 a126a8d0 c2008e0b; do echo "$hash" > data; tar -czf "$hash" data; done
echo b5a999dd > data
zip b5a999dd data
mv b5a999dd{.zip,}
rm data
cd ../downloads/source1/
ln -s ../../datastore/c2008e0b file1.tar.gz
ln -s ../../datastore/2abf99d6 file2.tar.gz
cd ../source2/
ln -s ../../datastore/a126a8d0 file1.tar.gz
ln -s ../../datastore/b5a999dd file2.zip
cd ../../collections/training1/
ln -s ../../downloads/source1/ part1
ln -s ../../downloads/source2/file1.tar.gz part2.tar.gz
ln -s ../../downloads/source2/file2.zip part3.zip
I think there are multiple issues that should be fixed in ratarmount:
- [x] Mounting a folder with a relative link can break that relative link if the mount point is in a different hierarchy level. I think it would be fine to redirect that relative link to work with the new mount point. E.g.
ratarmount -r subfolder/collections mountedwould break the part1 link if it is was relative:../../downloads/source1. - [x] I think recursion should indeed be able to follow symbolic links. The problem with that is that it would be slightly involved to implement transparently. By doing that, I would have to change the visible file type of that link to a folder to a pure folder. But, then again, it wouldn't be any different from mounting files.
Interestingly, it seems that ratarmount is able to follow symlinks, but only some of the time.
Unfortunately, I cannot reproduce that part. Could it be that part2.tar.gz is not a valid tar.gz file? Could you try to run ratarmount in foreground mode with -f and without -l and see if there are any warnings regarding that problematic .tar.gz? Try to repeat with -d 2 or -d 3 if there is no output.
Alternatively, one might think to try ratarmounting
/downloadsonto itself. I have no idea if this is even possible, but if it is, it would mean that applications lose the ability to access the tar files themselves (e.g. to checksum them), which I think would cause problems for my use case.
This should be possible with. ratarmount -lr collections collections. Beware! This completely hang up everything trying to access that mountpoint! I'm not sure why. I had to use sudo umount -f collections; fusermount -u collections to recover from this.
- [x] The hang does not occur without the
--lazyoption. I think it enters recursion when it tries to mount a file that has been mounted over. I must take care to access that file through the opened folder file descriptor to the mounted over folder. - [x] I would have somehow expected that the mounted over files would be accessible through the special
<file path>.versions/subfolders but somehow those are empty for recursive mount points. There should always be at least one entry there for the current version of the file and, in this case, I would expect a second for the original file.
Even if this worked, I guess that the access through the .versions folder wouldn't be suitable for you.
I looked around briefly for some existing FUSE application that could flatten symlinks, but I didn't see anything off-the-shelf for it.
This looks like what you are looking for: https://github.com/atomictom/hide-symlinks
I tried it like this:
git clone https://github.com/atomictom/hide-symlinks
cd hide-symlinks
make
export PATH="$PWD:$PATH"
cd [...]issue-102
mkdir collections-without-symlinks
hide-symlinks collections/ collections-without-symlinks
ratarmount -l -r collections-without-symlinks collections-mounted
tree collections-mounted/
Output:
collections-mounted/
└── training1
├── part1
│ ├── file1.tar.gz
│ │ └── data
│ ├── file1.tar.gz.index.sqlite
│ ├── file2.tar.gz
│ │ └── data
│ └── file2.tar.gz.index.sqlite
├── part2.tar.gz
│ └── data
├── part2.tar.gz.index.sqlite
├── part3.zip
│ └── data
└── part3.zip.index.sqlite
6 directories, 8 files
I hope this workaround works for you for now as fixing the rest probably have to wait to after Christmas maybe even till January. And I'd start with the hanging process problem because that is the most troublesome bug of all if a user encounters it because ratarmount cannot even be killed -9 in that state. I had to look up my own answer on stackexchange to recover from it.
I'm not entirely sure when, but the original issue has been fixed and tests has been added in e773e37.