dvc icon indicating copy to clipboard operation
dvc copied to clipboard

Pull file with symbolic link in path doesn't work

Open clementperon opened this issue 2 years ago • 7 comments

Bug Report

When I run: dvc pull -f ./data_soft/file.dvc

Where data_soft is a symbolink link to "./data/software"

the ouput of dvc is wrong and think that the file is already pulled

Collecting                                                                                                                                                                                                                          |0.00 [00:00,    ?entry/s]
Fetching
Building workspace index                                                                                                                                                                                                            |2.00 [00:00,  639entry/s]
Comparing indexes                                                                                                                                                                                                                   |3.00 [00:00,  921entry/s]
Applying changes                                                                                                                                                                                                                    |0.00 [00:00,     ?file/s]
Everything is up to date.

When I run: dvc pull -f ./data/software/file.dvc

the output of dvc is correct and pull the file

Collecting                                                                                                                                                                                                                          |0.00 [00:00,    ?entry/s]
Fetching                                                                                                                                                                                                                                                     ^Fetching       |Fetching from gs                                                                                                                                                                                                    0/1 [00:00<?,     ?file/s]
$ dvc doctor
DVC version: 3.27.0 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.4.6-76060406-generic-x86_64-with-glibc2.35
Subprojects:
	dvc_data = 2.18.2
	dvc_objects = 1.0.1
	dvc_render = 0.6.0
	dvc_task = 0.3.0
	scmrepo = 1.4.0
Supports:
	gs (gcsfs = 2023.9.0),
	http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3)
Config:
	Global: /home/clement/.config/dvc
	System: /etc/xdg/xdg-pop/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p7
Caches: local
Remotes: gs
Workspace directory: ext4 on /dev/nvme0n1p7
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/10c2f099136ce26cd5b605984c629317

clementperon avatar Oct 27 '23 11:10 clementperon

This has been introduced in 3.10.0 by @efiop

commit 225814fdde149e0f1cf8c33eec7284e57d9a8a20 Author: Ruslan Kuprieiev [email protected] Date: Sat Jul 29 01:51:28 2023 +0300

dvc: don't use realpath where not needed

clementperon avatar Oct 27 '23 13:10 clementperon

This is something that was supported accidentally. I don't think we should support symlink to a dvcfile at all. There are potential security concerns regarding that.

skshetry avatar Oct 30 '23 15:10 skshetry

This is something that was supported accidentally. I don't think we should support symlink to a dvcfile at all. There are potential security concerns regarding that.

Why accidentaly? There was explicit realpath everywhere in the code...

Security? It's proper to understand when there is a symbolic link and explicitly support it or not with a correct support than doing undefined behavior like what's is happening now, no?

clementperon avatar Oct 30 '23 16:10 clementperon

Why accidentaly? There was explicit realpath everywhere in the code...

The realpath was everywhere because dvc_repo.root_dir was also a realpath forcing us to make everything else a realpath, mostly due to macOS having /var symlink to /private/var AFAIR.

skshetry avatar Oct 30 '23 16:10 skshetry

I can confirm that it was accidentally supported in the past, but we don't even know how broken it really was.

@clementperon sure, we need to at least gracefully handle that, but I don't think we will support collecting .dvc files inside symlinked directories any time soon. Could you elaborate on your use case please, why do you need a symlink with .dvc files at all? Just trying to see if maybe we could propose a better solution.

efiop avatar Nov 01 '23 20:11 efiop

@efiop I have reorganized folder hierarchy where I store DVC files.

Instead of rewriting path everywhere I have created symlinks for a smooth transition.

clementperon avatar Nov 01 '23 20:11 clementperon

Just sayin but all tools support properly symbolic link, not sure why DVC shouldn't.

clementperon avatar Nov 27 '23 08:11 clementperon