dvc icon indicating copy to clipboard operation
dvc copied to clipboard

`data:status`: `--with-dirs --untracked` does not return directories for `untracked`

Open mattseddon opened this issue 3 years ago • 4 comments

Bug Report

Description

In the extension we are going to use dvc data status --show-json --with-dirs --granular --untracked --unchanged to get the data that we need for several locations in the UI. One of those locations is the SCM view.

As you know it is a normal pattern for users to dvc add an untracked directory (rather than the individual files within the directory).

Currently, the above command does not return directories within the untracked key of the dict. Which in turn breaks one of the SCM workflows (cannot add the entire directory):

image image

Reproduce

  1. navigate to the root of a DVC repo
  2. mkdir -p ./some/nested && touch ./some/nested/file.txt && touch ./some/nested/script.pl
  3. dvc data status --show-json --with-dirs --granular --untracked --unchanged
{                                                                                                                                                                                                  
  "untracked": [
    "some/nested/file.txt",
    "some/nested/script.pl"
  ],
 ...
}

Expected

Directories are returned for untracked.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.15.0 (pip)
---------------------------------
Platform: Python 3.8.9 on macOS-12.2.1-arm64-arm-64bit
Supports:
        webhdfs (fsspec = 2022.5.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        s3 (s3fs = 2022.5.0, boto3 = 1.21.21)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git

Additional Information (if any):

Also verified the issue with dvc 2.15.1.dev7+gd0eda1d5.

We previously called git with ls-files --others --exclude-standard --directory --no-empty-directory to get this information. I'll rollback to use git for the time being. Please LMK whether or not this issue will be addressed as I am happy to close (and call git) if this behaviour is acceptable for all other users of the CLI.

Thanks

mattseddon avatar Jul 25 '22 23:07 mattseddon

This is not possible right now for DVC, as it uses git's untracked information and we only get a flattened list.

Soon, we will have support to just get the root directories too, but to provide both root and items inside the untracked directory, we'd have to make two calls, one with git status --untracked-files=normal and the other with git status --untracked-files=all. I'm not sure if we should be making two calls, though performance is not going to be a issue here, just feels odd. --with-dirs is only useful for VSCode extension, so I'm okay with making two calls on --with-dirs. @efiop, @dtrifiro, any thoughts?

skshetry avatar Aug 08 '22 12:08 skshetry

one with git status --untracked-files=normal and the other with git status --untracked-files=normal

huh?

dtrifiro avatar Aug 08 '22 17:08 dtrifiro

This is not possible right now for DVC, as it uses git's untracked information and we only get a flattened list.

Soon, we will have support to just get the root directories too, but to provide both root and items inside the untracked directory, we'd have to make two calls, one with git status --untracked-files=normal and the other with git status --untracked-files=normal. I'm not sure if we should be making two calls, though performance is not going to be a issue here, just feels odd. --with-dirs is only useful for VSCode extension, so I'm okay with making two calls on --with-dirs. @efiop, @dtrifiro, any thoughts?

We currently make two calls:

git ls-files --others --exclude-standard --directory --no-empty-directory 
git ls-files --others --exclude-standard

It does feel odd but we need that information for this particular workflow.

mattseddon avatar Aug 09 '22 00:08 mattseddon

@dtrifiro, sorry I meant normal and all.

skshetry avatar Aug 09 '22 03:08 skshetry

The extension handles this by making an extra call to Git. Closing.

mattseddon avatar Apr 07 '23 03:04 mattseddon

@mattseddon, do you take dvc’s root directory into account? i.e. do not show untracked files if they are outside dvc?

skshetry avatar Apr 07 '23 03:04 skshetry