Negation "!" in .dvcignore doesn't unignore
Bug Report
Issue name
Negation "!", commonly used in .gitignore to "unignore" files, doesn't work in DVC.
Description
Given 2 files:
-
ignore.txt -
no-ignore.txt
We can git ignore them using this .gitignore file
ignore.txt
!no-ignore.txt
However, in DVC, it doesn't "unignore".
Reproduce
Output shown in comments
touch ignore.txt no-ignore.txt
echo -e 'ignore.txt\n!no-ignore.txt' > .dvcignore
echo -e 'ignore.txt\n!no-ignore.txt' > .gitignore
cat .dvcignore
# ignore.txt
# !no-ignore.txt
git check-ignore ignore.txt no-ignore.txt
# ignore.txt
dvc check-ignore ignore.txt no-ignore.txt
# ignore.txt
# no-ignore.txt
Expected
Ideally, both check-ignores should be the same
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 3.30.1 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
dvc_data = 2.22.0
dvc_objects = 1.2.0
dvc_render = 0.6.0
dvc_task = 0.3.0
scmrepo = 1.4.1
Supports:
gs (gcsfs = 2023.10.0),
http (aiohttp = 3.9.0, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.0, aiohttp-retry = 2.8.3)
Config:
Global: /home/jc/.config/dvc
System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: 9p on drvfs
Caches: local
Remotes: gs
Workspace directory: 9p on drvfs
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/67bc00a31a88271f0f2653ea5494f098
Additional Information (if any):
@Eve-ning I think dvcignore feature ie the patterns is working fine, the bug is in dvc check-ignore command. dvc add unignores the files marked with !. You can try the following.
file: .dvcignore
to_ignore*
!to_ignore-NOT.txt
$ mkdir data
$ cat '1' > data/to_ignore.txt
$ cat '1' > data/to_ignore-NOT.txt
$ dvc add data
$ tree .dvc/cache/files/md5
└── c1
└── ba58b05f6245f221ad65391fa6690b <<-- md5 for data/to_ignore-NOT.txt is added to cache
$ md5 data/*
MD5 (data/to_ignore-NOT.txt) = c1ba58b05f6245f221ad65391fa6690b
MD5 (data/to_ignore.txt) = 919d117956d3135c4c683ff021352f5c
It looks like this is the expected behaviour of check-ignore. this is the test for the behaviour.
@dberenbaum Do you think this test is correct, or am I missing something here? Tagging you since the last activity on this ticket was yours.
Yes, I think that is correct. It looks related to a previous issue in https://github.com/iterative/dvc/issues/5046. DVC uses both of these methods and they don't seem to always be consistent:
https://github.com/iterative/dvc/blob/9b5772fab8ad6ca7e885c97d094043b6ac2e34a9/dvc/ignore.py#L395-L409
https://github.com/iterative/dvc/blob/9b5772fab8ad6ca7e885c97d094043b6ac2e34a9/dvc/ignore.py#L411-L424
git check-ignore and dvc check-ignore behaves differently for the same ignore patterns.
For .dvcignore and .gitignore -
data/data1
to_ignore*
!to_ignore-NOT.txt
Following outputs differ -
dvc-demo-2 on master [!?] via 🐍 v3.11.4 (.venv)
❯ git check-ignore data/*
data/data1
data/to_ignore.txt
dvc-demo-2 on master [!?] via 🐍 v3.11.4 (.venv)
❯ dvc check-ignore data/*
== I am LOCAL DVC ==
data/data1
data/to_ignore-NOT.txt
data/to_ignore.txt
@dberenbaum This definitely seems like a bug. I believe the test cases are wrong and needs to be updated. Can the team confirm?
@anunayasri good catch. Yes, I think DVC check-ignore should behave the same way as git.
I believe the test cases are wrong and needs to be updated. Can the team confirm?
only test case, or the implementation as well?
@shcheklein Of course, we have to update both the test case and the implementation. By test case I meant that the expected behaviour is misaligned.
I working on another issue. Will try to fix this post that.