dvc
dvc copied to clipboard
`dvc list`: handle local repos differently?
When I run the command dvc list .
from any sub-directory of the project I get the following error:
ERROR: failed to list '.' - Failed to clone repo '.' to '/tmp/tmp2qmfnem7dvc-clone': Cmd('git') failed due to: exit code(128)
cmdline: git clone --no-single-branch -v . /tmp/tmp2qmfnem7dvc-clone
stderr: 'fatal: repository '.' does not exist
'
Though it works when executed from the root directory of the project
DVC: 0.91.1 (arch linux;pip)
UPDATED (@shcheklein):
repurposed it a bit - https://github.com/iterative/dvc/issues/3590#issuecomment-612558038
Hi @ahmed-shariff !
dvc list expects to receive a URL to a git repo, which .
isn't when you are not in the git repo root. Same as you can't git clone
that directory. Theoretically, we could check if URL that you pass is a git repo subdir, but I'm not sure if it is worth the effort and it also will probably lead to misuse, where people would try to use it as ls
in their subdirs.
I see. Thank you for the clarification.
To be honest I think it makes sense to handle this as a special case:
- don't clone
- if URL is a local path that is part of the DVC repo - show the result of
dvc list <path-to-Git-root> URL
. It means if I rundvc list .
I just see the files in the current location - it is probably the most expected result.
@iterative/engineering @casperdcl thoughts?
(reopening, since it's annoying to remember the special syntax when I deal with the local repo, and other users were caught by surprise)
@shcheklein I agree
I'm using a local remote and I receive the same error when running dvc list <path-to-local-remote>
. Is this expected behaviour?
For reference the "local remote" is actually a mounted network drive.
ERROR: failed to list '/mnt/dr_dvc/vision/dataset_registry' - Failed to clone repo '/mnt/dr_dvc/vision/dataset_registry' to '/tmp/tmp9bkq340cdvc-clone': Cmd('git') failed due to: exit code(128)
cmdline: git clone --no-single-branch -v /mnt/dr_dvc/vision/dataset_registry /tmp/tmp9bkq340cdvc-clone
stderr: 'fatal: repository '/mnt/dr_dvc/vision/dataset_registry' does not exist
Hi @jamessergeant, dvc list
expects the path or URL to the DVC repository itself, not to a remote storage location. In fact I believe it doesn't check remotes at all to produce the list. Whether the data exists in remote storage is not guaranteed by dvc list
. You have to attempt dvc get
or dvc import
to find out.
p.s. guys I'm updating the dvc list
cmd ref in iterative/dvc.org/pull/1174
Not sure if this belongs on this issue as well, but this also applies when running dvc list
inside of a dvc repo that was created with dvc init --subdir
.
@andrewcstewart Not directly. dvc list
support for subrepos will be implemented as a part of https://github.com/iterative/dvc/issues/3369
For the record: we are no longer cloning local repos, opening them directly instead. The only thing left is to make the CLI convenient for local use. E.g.
dvc list # should be same as dvc list .
cd subdir && dvc list # should be same as dvc list . subdir
dvc list dir # should be the same as dvc list . subdir
it is a bit odd from the CLI argument semantics, as it will have to rely on some heuristics, but still should be pretty convenient. Alternative might be to make the url explicit, similar to early dvc list
implementations, e.g. dvc list path_in_repo --url url
, but that might be an even harder pill to swallow. Both approaches will raise questions about dvc import/get
too, but those are clearly unusual to use locally.
Current CLI:
usage: dvc list [-h] [-q | -v] [-R] [--dvc-only] [--rev [<commit>]] url [path]
and with the proposed heuristics it will be:
usage: dvc list [-h] [-q | -v] [-R] [--dvc-only] [--rev [<commit>]] [url] [path]
and if
- no
url
and nopath
- try to list current directory in the current dvc project - no
path
andurl
is a local dir or vice-versa - try to listurl
directory
a problem that we are creating here for future us - not being able to accept multiple targets (same problem as we have right now in list/get/import but now worse). An explicit --project/--url/etc
for that would make it clearer. CC @dberenbaum @jorgeorpinel
I'm all for unifying list, get, import UI
not being able to accept multiple targets (same problem as we have right now in list/get/import but now worse). An explicit --project/--url/etc for that would make it clearer
Not seeing a need to list multiple targets. Maybe get/import but what about using the import-url interface instead, where url
includes location and path? That makes it easy to accept several ones.
BTW is this issue solved/outdated?
Not seeing a need to list multiple targets. Maybe get/import but what about using the import-url interface instead, where url includes location and path? That makes it easy to accept several ones.
@jorgeorpinel Doesn't work with git urls.
BTW is this issue solved/outdated?
Not the last part of it regarding handling local path as a target. Hence my questions.
Not seeing a need to list multiple targets.
That's my initial thought. Are we aware of a need for this? If this was a new command, I'd prefer --url
, but I wouldn't push to change it if there's no need.
Both approaches will raise questions about
dvc import/get
too, but those are clearly unusual to use locally.
By locally, you mean from inside the repo itself? I can't imagine dvc import . path
being useful. Or are there other questions these changes raise about import/get
?
That's my initial thought. Are we aware of a need for this? If this was a new command, I'd prefer --url, but I wouldn't push to change it if there's no need.
@dberenbaum No requests or anything yet. Just looking in the possible future :slightly_smiling_face:
By locally, you mean from inside the repo itself? I can't imagine dvc import . path being useful. Or are there other questions these changes raise about import/get?
Yep, from within the project or from another local project.
Btw, another interesting confusion is that people tend to use gs://
or s3://
or other dvc remote as an argument instead of git url. So maybe explicit --url
or, better, --project
flag would clarify the confusion in all of the commands. Btw, that would even open a possibility for future import import-url (and get get-url) unification into one command(dvc import
and dvc get
), since we'll have an explicit flag to differenciate the use cases of otherwise very similar commands. Though there has been some arguing about it even back when it was introduced (wish we had rfcs from back then :wink: ).
Anyway, a quick, local and intuitive solution is to go with that [url] [path]
solution I've suggested above. If everyone is okay with it, of course.
In short, make url
optional, default to .
? Sounds good, but ideally should apply to get/import* too (for UI consistency).
For the future, if --url
helps unify get and import interfaces I'm all for it.
Getting back to this (as I'm playing more with it). It would significantly improve usability of the dvc list
locally if make it ls
semantics (recognized cwd automatically).
E.g. I was trying to see what outputs exist in the https://github.com/iterative/get-started-experiments/
:
cd data
cd fashion-mnist
dvc list .
It returns root:
.dvcignore
.env
.gitignore
README.md
...
dvc.yaml
src
Trying:
dvc list . .
Also returns root.
dvc list . data/fashion-mnist/prepared
Fails:
ERROR: failed to list '.' - The path 'data/fashion-mnist/prepared' does not exist in the target repository '/Users/ivan/Projects/get-started-experiments' neither as a DVC output nor as a Git-tracked file.
dvc list -R . data/fashion-mnist
Also fails
and so on ... to be honest, I'm lost how can I list them at this point ... looks there are a few bugs + this behavior that is inconsistent depending on the (path, cwd) pair
I think this will be also part of the making list
, diff
, etc stable to integrate properly with VS Code.
to be honest, I'm lost how can I list them at this point ... looks there are a few bugs + this behavior that is inconsistent depending on the (path, cwd) pair
The workaround for now is to do dvc list $(git root) <path relative to git root>
.
Hi, any update on dvc list for the already cloned repo? It still retuns the project's root.
@machalx No updates so far, unfortunately 🙁
Hi,
Thanks for your work on dvc.
I have a related question: I'm trying to use dvc get path/to/local_dvc_project name_of_file_to_download
and I'm getting:
ERROR: failed to get 'name_of_file_to_download' from 'local_dvc_project' - Failed to clone repo 'local_dvc_project'.
Has there been work done on allowing to use dvc get without git? I saw people wondering about use cases: I want to use dvc to download some files during runtime of a docker container, at which point I have no git credentials. Ideally I would just use boto3, but at the moment I'm not sure how to reconstruct the path to the file I want to download.
@Toekan lets move the conversation to #7270
Hi, any update on dvc list for the already cloned repo? It still retuns the project's root.
You should be able to list the contents of the relative path now although the first argument will still be interpreted as the repo url (like dvc list . [relative_path]
).
I'll close this for now.