Pull file with symbolic link in path doesn't work
Bug Report
When I run:
dvc pull -f ./data_soft/file.dvc
Where data_soft is a symbolink link to "./data/software"
the ouput of dvc is wrong and think that the file is already pulled
Collecting |0.00 [00:00, ?entry/s]
Fetching
Building workspace index |2.00 [00:00, 639entry/s]
Comparing indexes |3.00 [00:00, 921entry/s]
Applying changes |0.00 [00:00, ?file/s]
Everything is up to date.
When I run:
dvc pull -f ./data/software/file.dvc
the output of dvc is correct and pull the file
Collecting |0.00 [00:00, ?entry/s]
Fetching ^Fetching |Fetching from gs 0/1 [00:00<?, ?file/s]
$ dvc doctor
DVC version: 3.27.0 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.4.6-76060406-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 2.18.2
dvc_objects = 1.0.1
dvc_render = 0.6.0
dvc_task = 0.3.0
scmrepo = 1.4.0
Supports:
gs (gcsfs = 2023.9.0),
http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3)
Config:
Global: /home/clement/.config/dvc
System: /etc/xdg/xdg-pop/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p7
Caches: local
Remotes: gs
Workspace directory: ext4 on /dev/nvme0n1p7
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/10c2f099136ce26cd5b605984c629317
This has been introduced in 3.10.0 by @efiop
commit 225814fdde149e0f1cf8c33eec7284e57d9a8a20 Author: Ruslan Kuprieiev [email protected] Date: Sat Jul 29 01:51:28 2023 +0300
dvc: don't use realpath where not needed
This is something that was supported accidentally. I don't think we should support symlink to a dvcfile at all. There are potential security concerns regarding that.
This is something that was supported accidentally. I don't think we should support symlink to a dvcfile at all. There are potential security concerns regarding that.
Why accidentaly? There was explicit realpath everywhere in the code...
Security? It's proper to understand when there is a symbolic link and explicitly support it or not with a correct support than doing undefined behavior like what's is happening now, no?
Why accidentaly? There was explicit realpath everywhere in the code...
The realpath was everywhere because dvc_repo.root_dir was also a realpath forcing us to make everything else a realpath, mostly due to macOS having /var symlink to /private/var AFAIR.
I can confirm that it was accidentally supported in the past, but we don't even know how broken it really was.
@clementperon sure, we need to at least gracefully handle that, but I don't think we will support collecting .dvc files inside symlinked directories any time soon. Could you elaborate on your use case please, why do you need a symlink with .dvc files at all? Just trying to see if maybe we could propose a better solution.
@efiop I have reorganized folder hierarchy where I store DVC files.
Instead of rewriting path everywhere I have created symlinks for a smooth transition.
Just sayin but all tools support properly symbolic link, not sure why DVC shouldn't.