dvc
dvc copied to clipboard
`data:status`: `--with-dirs --untracked` does not return directories for `untracked`
Bug Report
Description
In the extension we are going to use dvc data status --show-json --with-dirs --granular --untracked --unchanged to get the data that we need for several locations in the UI. One of those locations is the SCM view.
As you know it is a normal pattern for users to dvc add an untracked directory (rather than the individual files within the directory).
Currently, the above command does not return directories within the untracked key of the dict. Which in turn breaks one of the SCM workflows (cannot add the entire directory):
Reproduce
- navigate to the root of a DVC repo
mkdir -p ./some/nested && touch ./some/nested/file.txt && touch ./some/nested/script.pldvc data status --show-json --with-dirs --granular --untracked --unchanged
{
"untracked": [
"some/nested/file.txt",
"some/nested/script.pl"
],
...
}
Expected
Directories are returned for untracked.
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 2.15.0 (pip)
---------------------------------
Platform: Python 3.8.9 on macOS-12.2.1-arm64-arm-64bit
Supports:
webhdfs (fsspec = 2022.5.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
s3 (s3fs = 2022.5.0, boto3 = 1.21.21)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git
Additional Information (if any):
Also verified the issue with dvc 2.15.1.dev7+gd0eda1d5.
We previously called git with ls-files --others --exclude-standard --directory --no-empty-directory to get this information. I'll rollback to use git for the time being. Please LMK whether or not this issue will be addressed as I am happy to close (and call git) if this behaviour is acceptable for all other users of the CLI.
Thanks
This is not possible right now for DVC, as it uses git's untracked information and we only get a flattened list.
Soon, we will have support to just get the root directories too, but to provide both root and items inside the untracked directory, we'd have to make two calls, one with git status --untracked-files=normal and the other with git status --untracked-files=all. I'm not sure if we should be making two calls, though performance is not going to be a issue here, just feels odd. --with-dirs is only useful for VSCode extension, so I'm okay with making two calls on --with-dirs.
@efiop, @dtrifiro, any thoughts?
one with git status --untracked-files=normal and the other with git status --untracked-files=normal
huh?
This is not possible right now for DVC, as it uses
git's untracked information and we only get a flattened list.Soon, we will have support to just get the root directories too, but to provide both root and items inside the untracked directory, we'd have to make two calls, one with
git status --untracked-files=normaland the other withgit status --untracked-files=normal. I'm not sure if we should be making two calls, though performance is not going to be a issue here, just feels odd.--with-dirsis only useful for VSCode extension, so I'm okay with making two calls on--with-dirs. @efiop, @dtrifiro, any thoughts?
We currently make two calls:
git ls-files --others --exclude-standard --directory --no-empty-directory
git ls-files --others --exclude-standard
It does feel odd but we need that information for this particular workflow.
@dtrifiro, sorry I meant normal and all.
The extension handles this by making an extra call to Git. Closing.
@mattseddon, do you take dvc’s root directory into account? i.e. do not show untracked files if they are outside dvc?