dvc icon indicating copy to clipboard operation
dvc copied to clipboard

`dvc list`: handle local repos differently?

Open ahmed-shariff opened this issue 4 years ago • 24 comments

When I run the command dvc list . from any sub-directory of the project I get the following error:

ERROR: failed to list '.' - Failed to clone repo '.' to '/tmp/tmp2qmfnem7dvc-clone': Cmd('git') failed due to: exit code(128)
  cmdline: git clone --no-single-branch -v . /tmp/tmp2qmfnem7dvc-clone
  stderr: 'fatal: repository '.' does not exist
'

Though it works when executed from the root directory of the project

DVC: 0.91.1 (arch linux;pip)


UPDATED (@shcheklein):

repurposed it a bit - https://github.com/iterative/dvc/issues/3590#issuecomment-612558038

ahmed-shariff avatar Apr 04 '20 03:04 ahmed-shariff

Hi @ahmed-shariff !

dvc list expects to receive a URL to a git repo, which . isn't when you are not in the git repo root. Same as you can't git clone that directory. Theoretically, we could check if URL that you pass is a git repo subdir, but I'm not sure if it is worth the effort and it also will probably lead to misuse, where people would try to use it as ls in their subdirs.

efiop avatar Apr 05 '20 19:04 efiop

I see. Thank you for the clarification.

ahmed-shariff avatar Apr 07 '20 01:04 ahmed-shariff

To be honest I think it makes sense to handle this as a special case:

  • don't clone
  • if URL is a local path that is part of the DVC repo - show the result of dvc list <path-to-Git-root> URL. It means if I run dvc list . I just see the files in the current location - it is probably the most expected result.

@iterative/engineering @casperdcl thoughts?

(reopening, since it's annoying to remember the special syntax when I deal with the local repo, and other users were caught by surprise)

shcheklein avatar Apr 12 '20 03:04 shcheklein

@shcheklein I agree

casperdcl avatar Apr 12 '20 10:04 casperdcl

I'm using a local remote and I receive the same error when running dvc list <path-to-local-remote>. Is this expected behaviour?

For reference the "local remote" is actually a mounted network drive.

ERROR: failed to list '/mnt/dr_dvc/vision/dataset_registry' - Failed to clone repo '/mnt/dr_dvc/vision/dataset_registry' to '/tmp/tmp9bkq340cdvc-clone': Cmd('git') failed due to: exit code(128)
  cmdline: git clone --no-single-branch -v /mnt/dr_dvc/vision/dataset_registry /tmp/tmp9bkq340cdvc-clone
  stderr: 'fatal: repository '/mnt/dr_dvc/vision/dataset_registry' does not exist

jamessergeant avatar Apr 23 '20 12:04 jamessergeant

Hi @jamessergeant, dvc list expects the path or URL to the DVC repository itself, not to a remote storage location. In fact I believe it doesn't check remotes at all to produce the list. Whether the data exists in remote storage is not guaranteed by dvc list. You have to attempt dvc get or dvc import to find out.

jorgeorpinel avatar Apr 23 '20 16:04 jorgeorpinel

p.s. guys I'm updating the dvc list cmd ref in iterative/dvc.org/pull/1174

jorgeorpinel avatar Apr 23 '20 17:04 jorgeorpinel

Not sure if this belongs on this issue as well, but this also applies when running dvc list inside of a dvc repo that was created with dvc init --subdir.

andrewcstewart avatar Jul 01 '20 18:07 andrewcstewart

@andrewcstewart Not directly. dvc list support for subrepos will be implemented as a part of https://github.com/iterative/dvc/issues/3369

efiop avatar Jul 01 '20 19:07 efiop

For the record: we are no longer cloning local repos, opening them directly instead. The only thing left is to make the CLI convenient for local use. E.g.

dvc list # should be same as dvc list .
cd subdir && dvc list # should be same as dvc list . subdir
dvc list dir # should be the same as dvc list . subdir

it is a bit odd from the CLI argument semantics, as it will have to rely on some heuristics, but still should be pretty convenient. Alternative might be to make the url explicit, similar to early dvc list implementations, e.g. dvc list path_in_repo --url url, but that might be an even harder pill to swallow. Both approaches will raise questions about dvc import/get too, but those are clearly unusual to use locally.

efiop avatar Mar 08 '21 22:03 efiop

Current CLI:

usage: dvc list [-h] [-q | -v] [-R] [--dvc-only] [--rev [<commit>]] url [path]      

and with the proposed heuristics it will be:

usage: dvc list [-h] [-q | -v] [-R] [--dvc-only] [--rev [<commit>]] [url] [path]

and if

  • no url and no path - try to list current directory in the current dvc project
  • no path and url is a local dir or vice-versa - try to list url directory

a problem that we are creating here for future us - not being able to accept multiple targets (same problem as we have right now in list/get/import but now worse). An explicit --project/--url/etc for that would make it clearer. CC @dberenbaum @jorgeorpinel

efiop avatar Mar 08 '21 22:03 efiop

I'm all for unifying list, get, import UI

not being able to accept multiple targets (same problem as we have right now in list/get/import but now worse). An explicit --project/--url/etc for that would make it clearer

Not seeing a need to list multiple targets. Maybe get/import but what about using the import-url interface instead, where url includes location and path? That makes it easy to accept several ones.

BTW is this issue solved/outdated?

jorgeorpinel avatar Mar 09 '21 00:03 jorgeorpinel

Not seeing a need to list multiple targets. Maybe get/import but what about using the import-url interface instead, where url includes location and path? That makes it easy to accept several ones.

@jorgeorpinel Doesn't work with git urls.

BTW is this issue solved/outdated?

Not the last part of it regarding handling local path as a target. Hence my questions.

efiop avatar Mar 09 '21 12:03 efiop

Not seeing a need to list multiple targets.

That's my initial thought. Are we aware of a need for this? If this was a new command, I'd prefer --url, but I wouldn't push to change it if there's no need.

Both approaches will raise questions about dvc import/get too, but those are clearly unusual to use locally.

By locally, you mean from inside the repo itself? I can't imagine dvc import . path being useful. Or are there other questions these changes raise about import/get?

dberenbaum avatar Mar 09 '21 21:03 dberenbaum

That's my initial thought. Are we aware of a need for this? If this was a new command, I'd prefer --url, but I wouldn't push to change it if there's no need.

@dberenbaum No requests or anything yet. Just looking in the possible future :slightly_smiling_face:

By locally, you mean from inside the repo itself? I can't imagine dvc import . path being useful. Or are there other questions these changes raise about import/get?

Yep, from within the project or from another local project.

Btw, another interesting confusion is that people tend to use gs:// or s3:// or other dvc remote as an argument instead of git url. So maybe explicit --url or, better, --project flag would clarify the confusion in all of the commands. Btw, that would even open a possibility for future import import-url (and get get-url) unification into one command(dvc import and dvc get), since we'll have an explicit flag to differenciate the use cases of otherwise very similar commands. Though there has been some arguing about it even back when it was introduced (wish we had rfcs from back then :wink: ).

Anyway, a quick, local and intuitive solution is to go with that [url] [path] solution I've suggested above. If everyone is okay with it, of course.

efiop avatar Mar 09 '21 22:03 efiop

In short, make url optional, default to .? Sounds good, but ideally should apply to get/import* too (for UI consistency).

For the future, if --url helps unify get and import interfaces I'm all for it.

jorgeorpinel avatar Mar 15 '21 03:03 jorgeorpinel

Getting back to this (as I'm playing more with it). It would significantly improve usability of the dvc list locally if make it ls semantics (recognized cwd automatically).

E.g. I was trying to see what outputs exist in the https://github.com/iterative/get-started-experiments/:

cd data
cd fashion-mnist
dvc list .

It returns root:

.dvcignore
.env
.gitignore
README.md
...
dvc.yaml
src

Trying:

dvc list . .

Also returns root.

dvc list . data/fashion-mnist/prepared

Fails:

ERROR: failed to list '.' - The path 'data/fashion-mnist/prepared' does not exist in the target repository '/Users/ivan/Projects/get-started-experiments' neither as a DVC output nor as a Git-tracked file.

dvc list -R . data/fashion-mnist

Also fails

and so on ... to be honest, I'm lost how can I list them at this point ... looks there are a few bugs + this behavior that is inconsistent depending on the (path, cwd) pair

shcheklein avatar Jun 06 '21 17:06 shcheklein

I think this will be also part of the making list, diff, etc stable to integrate properly with VS Code.

shcheklein avatar Jun 06 '21 17:06 shcheklein

to be honest, I'm lost how can I list them at this point ... looks there are a few bugs + this behavior that is inconsistent depending on the (path, cwd) pair

The workaround for now is to do dvc list $(git root) <path relative to git root>.

skshetry avatar Jun 07 '21 02:06 skshetry

Hi, any update on dvc list for the already cloned repo? It still retuns the project's root.

machalx avatar Aug 28 '21 11:08 machalx

@machalx No updates so far, unfortunately 🙁

efiop avatar Aug 29 '21 04:08 efiop

Hi,

Thanks for your work on dvc.

I have a related question: I'm trying to use dvc get path/to/local_dvc_project name_of_file_to_download and I'm getting:

ERROR: failed to get 'name_of_file_to_download' from 'local_dvc_project' - Failed to clone repo 'local_dvc_project'.

Has there been work done on allowing to use dvc get without git? I saw people wondering about use cases: I want to use dvc to download some files during runtime of a docker container, at which point I have no git credentials. Ideally I would just use boto3, but at the moment I'm not sure how to reconstruct the path to the file I want to download.

Toekan avatar Jan 14 '22 09:01 Toekan

@Toekan lets move the conversation to #7270

pared avatar Jan 19 '22 17:01 pared

Hi, any update on dvc list for the already cloned repo? It still retuns the project's root.

You should be able to list the contents of the relative path now although the first argument will still be interpreted as the repo url (like dvc list . [relative_path]).

dberenbaum avatar Jul 08 '22 20:07 dberenbaum

I'll close this for now.

efiop avatar Jul 27 '23 03:07 efiop