fds
fds copied to clipboard
"fds forget" feature proposal
Scenario: You accidentally git add
'ed or dvc add
'ed a path that you didn't intend to.
It's a commonly googled question: https://stackoverflow.com/questions/1274057/how-to-make-git-forget-about-a-file-that-was-tracked-but-is-now-in-gitignore
What fds forget
can add:
- Easier naming - no more googling required
- Automatically detect whether the file is tracked by git or DVC
- Remove the file from DVC cache if it is tracked by DVC (after confirmation from the user)
- Remove the relevant
.dvc
file if it exists, and also make git forget about that file - More?
Hi @guysmoilov
I looked at the git part of this problem. There are two parts:
a) If you have not yet committed the file yet, then a simple git restore --staged <file>
will do.
b) But if you want to untrack a file that has already been tracked and committed, then it's tricky because doing git rm --cached
will remove the file from others' systems (locally) when they do a git pull
(You also have to list the file in .gitignore
). If we do git update-index --assume-unchanged
, then it won't show the file in unstaged changes, but I think it continues to remain in the repo.
@indweller Thanks for the research!
Yes, making git forget a committed file is daly next to impossible for a distributed repo.
As the first line in the issue suggests, I think we should focus on git add
and dvc add
- fds forget
is IMO much easier to remember than git restore --staged <file>
and also should handle removing the file from DVC tracking.
Ok so for the git part it can do git restore
and the for the DVC part it can do dvc remove
(https://dvc.org/doc/user-guide/how-to/stop-tracking-data). Can I work on this issue?
@indweller I think you also need to run some form of dvc gc
after dvc remove
.
And sure, thank you!
Interesting potentially relevant project: https://rtyley.github.io/bfg-repo-cleaner/