dvc
dvc copied to clipboard
install: does not work in a submodule
Bug Report
Issue name
install: does not work in a submodule
Description
If you try and run dvc install in a git/dvc repo that is a git submodule, it will fail, as .git is a file linking to the .git/modules directory in the parent.
Reproduce
- Create a git repo with DVC installed
- Create a parent git repo
- Install in the parent git repo the dvc repo as a submodule
git submodule add <dvc-repo> - cd into the submodule and attempt
dvc install
Expected
The DVC hooks should be installed
Environment information
Arch linux, current.
Output of dvc doctor:
DVC version: 2.12.0 (pip)
---------------------------------
Platform: Python 3.10.5 on Linux-5.16.12-arch1-1-x86_64-with-glibc2.35
Supports:
gdrive (pydrive2 = 1.10.1),
webhdfs (fsspec = 2022.5.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.5),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.5)
Cache types: hardlink, symlink
Cache directory: zfs on mjolnir/DATA/home
Caches: local
Remotes: gdrive
Workspace directory: zfs on mjolnir/DATA/home
Repo: dvc, git
Additional Information (if any):
Instead of trying to write to .git/hooks, you should try to write to the output of:
❯ git rev-parse --git-path hooks
in my case, the submodule is b225a, and the output of that command is:
/home/cdl/repo/gitlab.com/cdl-images/image-processing/.git/modules/b225a/hooks
verbose output:
❯ dvc install --verbose
2022-07-03 16:43:41,848 ERROR: unexpected error - [Errno 20] Not a directory: '/home/cdl/repo/gitlab.com/cdl-images/image-processing/b225a/.git/hooks'
------------------------------------------------------------
Traceback (most recent call last):
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/cli/__init__.py", line 185, in main
ret = cmd.do_run()
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/commands/install.py", line 14, in run
self.repo.install(self.args.use_pre_commit_tool)
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/repo/install.py", line 81, in install
return install_hooks(scm)
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/repo/install.py", line 60, in install_hooks
scm.install_hook(hook, f"exec dvc git-hook {hook} $@")
File "/home/cdl/.local/lib/python3.10/site-packages/scmrepo/git/__init__.py", line 242, in install_hook
self.hooks_dir.mkdir(exist_ok=True)
File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
NotADirectoryError: [Errno 20] Not a directory: '/home/cdl/repo/gitlab.com/cdl-images/image-processing/b225a/.git/hooks'
------------------------------------------------------------
2022-07-03 16:43:41,977 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/cli/__init__.py", line 185, in main
ret = cmd.do_run()
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/commands/install.py", line 14, in run
self.repo.install(self.args.use_pre_commit_tool)
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/repo/install.py", line 81, in install
return install_hooks(scm)
File "/home/cdl/.local/lib/python3.10/site-packages/dvc/repo/install.py", line 60, in install_hooks
scm.install_hook(hook, f"exec dvc git-hook {hook} $@")
File "/home/cdl/.local/lib/python3.10/site-packages/scmrepo/git/__init__.py", line 242, in install_hook
self.hooks_dir.mkdir(exist_ok=True)
File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
NotADirectoryError: [Errno 20] Not a directory: '/home/cdl/repo/gitlab.com/cdl-images/image-processing/b225a/.git/hooks'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 68, in _try_links
return _link(link, from_fs, from_path, to_fs, to_path)
File "/usr/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 28, in _link
func(from_path, to_path)
File "/usr/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 288, in reflink
return self.fs.reflink(from_info, to_info)
File "/usr/lib/python3.10/site-packages/dvc_objects/fs/implementations/local.py", line 157, in reflink
return system.reflink(path1, path2)
File "/usr/lib/python3.10/site-packages/dvc_objects/fs/system.py", line 105, in reflink
_reflink_linux(source, link_name)
File "/usr/lib/python3.10/site-packages/dvc_objects/fs/system.py", line 91, in _reflink_linux
fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 95] Operation not supported
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 127, in _test_link
_try_links([link], from_fs, from_file, to_fs, to_file)
File "/usr/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 76, in _try_links
raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-07-03 16:43:41,977 DEBUG: Removing '/home/cdl/repo/gitlab.com/cdl-images/image-processing/.DXiHZMQF3pUqRr5thYywKV.tmp'
2022-07-03 16:43:41,977 DEBUG: Removing '/home/cdl/repo/gitlab.com/cdl-images/image-processing/.DXiHZMQF3pUqRr5thYywKV.tmp'
2022-07-03 16:43:41,977 DEBUG: Removing '/home/cdl/repo/gitlab.com/cdl-images/image-processing/.DXiHZMQF3pUqRr5thYywKV.tmp'
2022-07-03 16:43:41,977 DEBUG: Removing '/home/cdl/repo/gitlab.com/cdl-images/image-processing/b225a/.dvc/cache/.d9M2fE4ab2y3bC3mUtUUD7.tmp'
2022-07-03 16:43:41,980 DEBUG: Version info for developers:
DVC version: 2.12.0 (pip)
---------------------------------
Platform: Python 3.10.5 on Linux-5.16.12-arch1-1-x86_64-with-glibc2.35
Supports:
gdrive (pydrive2 = 1.10.1),
webhdfs (fsspec = 2022.5.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.5),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.5)
Cache types: hardlink, symlink
Cache directory: zfs on mjolnir/DATA/home
Caches: local
Remotes: gdrive
Workspace directory: zfs on mjolnir/DATA/home
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-07-03 16:43:41,980 DEBUG: Analytics is enabled.
2022-07-03 16:43:41,999 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpzmxgq_hw']'
2022-07-03 16:43:42,000 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpzmxgq_hw']'
By the way, my use case is probably a bit different than the one you envisioned. I am actually using DVC for my photography workflow. I was using git-annex - but it was overly complex for what I needed, and git-lfs leaves way too many copies of lots of large files around on clients.
Each 'project' ends up being a git submodule that has DVC enabled, with a gdrive back-end for all of the images. All of the dvc and sidecar files are tracked in git (gitlab backing).
There is an over-arching git project that encloses all of the submodules, has the software tooling, and various other support bits and bobs.
I can clone that overarching repo on to any device (laptop, desktop, etc) and get my whole archive, then decide which projects to actually run a dvc pull on and work on at any given time.
Hi @liljenstolpe , not to dismiss the request, but is there a reason for not running dvc install in the original repo being added as a submodule?
Anyhow, this would need to be handled in scmrepo:
https://github.com/iterative/scmrepo/blob/e339077325eba9c4076710615488c1e5944b4bdb/scmrepo/git/init.py#L112-L115
Because the hooks in the original repo are stored in a .git/hooks file which isn't part of the actual repo (it's local context - i.e. you have to do it in all instances of the cloned repo). So, you have to do it after you add the submodule. However, submodule ".git" directories are actually in the parent's .git/modules/