dvc icon indicating copy to clipboard operation
dvc copied to clipboard

get/import: consider lstrip("/") for data path

Open efiop opened this issue 4 years ago • 17 comments

E.g. dvc get https://github.com/user/proj /path/to/data -> dvc get https://github.com/user/proj path/to/data. https://discordapp.com/channels/485586884165107732/485596304961962003/633731542539173889 So far, not seeing any reason against it.

efiop avatar Oct 15 '19 20:10 efiop

it sounds a bit confusing to alter the path that way - /path/to/data is a valid path by itself and is not the same as path/to/data

shcheklein avatar Oct 16 '19 04:10 shcheklein

@shcheklein It is the same for us, because it is a path relative to the dvc repo that we are importing from. It is a safe assumption to prevent errors when users are confused about the concept of root.

efiop avatar Oct 16 '19 10:10 efiop

@efiop I think I don't get the point tbh. /path/to/data is an absolute path that has a very precise meaning. If we don't want to support them (absolute path), let's just fail fast a write an error instead of some magic.

shcheklein avatar Oct 16 '19 17:10 shcheklein

@shcheklein /path/to/data is an absolute path, but when it is supplied as an argument to dvc get/import we can assume that user meant path/to/data, because abs path in this context doesn't make any sense unless the user is confused by the "root" concept and thinks that "root" is the repo he is importing from.

efiop avatar Oct 16 '19 18:10 efiop

kk, I think I misunderstood the intention here, was thinking about -o option for whatever reason (not a path inside the project to pull the artifact from).

shcheklein avatar Oct 16 '19 19:10 shcheklein

Hey @efiop , I have just started contributing to open source. I would like to resolve this issue. Can I know in which file the changes need to be made? Or do I need to search for it.

anshks avatar Oct 22 '19 07:10 anshks

Hi @anshks ! Sure. So you need to path = path.lstrip("/") in https://github.com/iterative/dvc/blob/0.63.4/dvc/repo/imp.py#L5 and in https://github.com/iterative/dvc/blob/0.63.4/dvc/repo/get.py#L24 .

efiop avatar Oct 22 '19 12:10 efiop

Thank you @efiop . Looks like it has been resolved now.

anshks avatar Oct 24 '19 23:10 anshks

Oops, thanks for the heads up!

efiop avatar Oct 24 '19 23:10 efiop

Turns out people are using abs paths O_o https://discuss.dvc.org/t/dvc-get-doesnt-work-with-absolute-paths-but-everything-else-seems-to/249 . Reverted the PR.

efiop avatar Oct 26 '19 20:10 efiop

As an alternative we can try first find an absolute path, after that strip and find the relative one. There will be some really rare edge cases where it can backfire.

Another option would be just notifying users that they probably meant a relative path if absolute is not found and let them fix it

shcheklein avatar Oct 26 '19 20:10 shcheklein

+1 for hint

Suor avatar Jan 18 '20 17:01 Suor

Hi, is this issue open to work on? I would like to take this @efiop @shcheklein I am thinking of using os.path.exists to check if the path is present. This would work if it's an abs path or a relative path without / on left, else doing path = path.lstrip("/") should remove the slash from relative paths. If anyone would like to share some better approach, please do so.

SanchiMittal avatar Oct 31 '20 14:10 SanchiMittal

@SanchiMittal sure, go for it.

Another option would be just notifying users that they probably meant a relative path if absolute is not found and let them fix it

I would personally go with this option though. Any magic can backfire, it's better to be explicit to my mind.

shcheklein avatar Oct 31 '20 18:10 shcheklein

Okay then, shall I raise an error when os.path.exists() returns false? Also how can I check if my changes worked? Is there any specific test for it? Thanks.

SanchiMittal avatar Oct 31 '20 19:10 SanchiMittal

If path is not found it probably already raises some error?

In this case all we need to do is to add a message in case path is an absolute one.

Also how can I check if my changes worked?

There are tests for get/import out there, I hope one of them can be adjusted for this case.

I guess you can first locally run any simple get/import with an absolute path? Here are some examples that can be modified - https://dvc.org/doc/start/data-access

shcheklein avatar Oct 31 '20 19:10 shcheklein

If path is not found it probably already raises some error?

Currently it clones url, then evaluates the absolute path from there in the local file system, so it gives an error message such as Unable to find DVC file with output '../../../data.

(a) If we don't want to support them (absolute path), let's just fail fast a write an error instead of some magic. (b) just notifying users that they probably meant a relative path if absolute is not found and let them fix it

➕1️⃣ for one of these.

jorgeorpinel avatar Jun 20 '21 05:06 jorgeorpinel