dvc icon indicating copy to clipboard operation
dvc copied to clipboard

install: does not work in a submodule

Open liljenstolpe opened this issue 3 years ago • 3 comments

Bug Report

Issue name

install: does not work in a submodule

Description

If you try and run dvc install in a git/dvc repo that is a git submodule, it will fail, as .git is a file linking to the .git/modules directory in the parent.

Reproduce

  1. Create a git repo with DVC installed
  2. Create a parent git repo
  3. Install in the parent git repo the dvc repo as a submodule git submodule add <dvc-repo>
  4. cd into the submodule and attempt dvc install

Expected

The DVC hooks should be installed

Environment information

Arch linux, current.

Output of dvc doctor:

DVC version: 2.12.0 (pip)
---------------------------------
Platform: Python 3.10.5 on Linux-5.16.12-arch1-1-x86_64-with-glibc2.35
Supports:
	gdrive (pydrive2 = 1.10.1),
	webhdfs (fsspec = 2022.5.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.5),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.5)
Cache types: hardlink, symlink
Cache directory: zfs on mjolnir/DATA/home
Caches: local
Remotes: gdrive
Workspace directory: zfs on mjolnir/DATA/home
Repo: dvc, git

Additional Information (if any):

Instead of trying to write to .git/hooks, you should try to write to the output of:

❯ git rev-parse --git-path hooks

in my case, the submodule is b225a, and the output of that command is:

/home/cdl/repo/gitlab.com/cdl-images/image-processing/.git/modules/b225a/hooks

verbose output:

❯ dvc install --verbose
2022-07-03 16:43:41,848 ERROR: unexpected error - [Errno 20] Not a directory: '/home/cdl/repo/gitlab.com/cdl-images/image-processing/b225a/.git/hooks'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/commands/install.py", line 14, in run
    self.repo.install(self.args.use_pre_commit_tool)
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/repo/install.py", line 81, in install
    return install_hooks(scm)
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/repo/install.py", line 60, in install_hooks
    scm.install_hook(hook, f"exec dvc git-hook {hook} $@")
  File "/home/cdl/.local/lib/python3.10/site-packages/scmrepo/git/__init__.py", line 242, in install_hook
    self.hooks_dir.mkdir(exist_ok=True)
  File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
NotADirectoryError: [Errno 20] Not a directory: '/home/cdl/repo/gitlab.com/cdl-images/image-processing/b225a/.git/hooks'
------------------------------------------------------------
2022-07-03 16:43:41,977 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/commands/install.py", line 14, in run
    self.repo.install(self.args.use_pre_commit_tool)
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/repo/install.py", line 81, in install
    return install_hooks(scm)
  File "/home/cdl/.local/lib/python3.10/site-packages/dvc/repo/install.py", line 60, in install_hooks
    scm.install_hook(hook, f"exec dvc git-hook {hook} $@")
  File "/home/cdl/.local/lib/python3.10/site-packages/scmrepo/git/__init__.py", line 242, in install_hook
    self.hooks_dir.mkdir(exist_ok=True)
  File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
NotADirectoryError: [Errno 20] Not a directory: '/home/cdl/repo/gitlab.com/cdl-images/image-processing/b225a/.git/hooks'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 68, in _try_links
    return _link(link, from_fs, from_path, to_fs, to_path)
  File "/usr/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 28, in _link
    func(from_path, to_path)
  File "/usr/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 288, in reflink
    return self.fs.reflink(from_info, to_info)
  File "/usr/lib/python3.10/site-packages/dvc_objects/fs/implementations/local.py", line 157, in reflink
    return system.reflink(path1, path2)
  File "/usr/lib/python3.10/site-packages/dvc_objects/fs/system.py", line 105, in reflink
    _reflink_linux(source, link_name)
  File "/usr/lib/python3.10/site-packages/dvc_objects/fs/system.py", line 91, in _reflink_linux
    fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 95] Operation not supported

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 127, in _test_link
    _try_links([link], from_fs, from_file, to_fs, to_file)
  File "/usr/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 76, in _try_links
    raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-07-03 16:43:41,977 DEBUG: Removing '/home/cdl/repo/gitlab.com/cdl-images/image-processing/.DXiHZMQF3pUqRr5thYywKV.tmp'
2022-07-03 16:43:41,977 DEBUG: Removing '/home/cdl/repo/gitlab.com/cdl-images/image-processing/.DXiHZMQF3pUqRr5thYywKV.tmp'
2022-07-03 16:43:41,977 DEBUG: Removing '/home/cdl/repo/gitlab.com/cdl-images/image-processing/.DXiHZMQF3pUqRr5thYywKV.tmp'
2022-07-03 16:43:41,977 DEBUG: Removing '/home/cdl/repo/gitlab.com/cdl-images/image-processing/b225a/.dvc/cache/.d9M2fE4ab2y3bC3mUtUUD7.tmp'
2022-07-03 16:43:41,980 DEBUG: Version info for developers:
DVC version: 2.12.0 (pip)
---------------------------------
Platform: Python 3.10.5 on Linux-5.16.12-arch1-1-x86_64-with-glibc2.35
Supports:
	gdrive (pydrive2 = 1.10.1),
	webhdfs (fsspec = 2022.5.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.5),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.5)
Cache types: hardlink, symlink
Cache directory: zfs on mjolnir/DATA/home
Caches: local
Remotes: gdrive
Workspace directory: zfs on mjolnir/DATA/home
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-07-03 16:43:41,980 DEBUG: Analytics is enabled.
2022-07-03 16:43:41,999 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpzmxgq_hw']'
2022-07-03 16:43:42,000 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpzmxgq_hw']'

liljenstolpe avatar Jul 03 '22 23:07 liljenstolpe

By the way, my use case is probably a bit different than the one you envisioned. I am actually using DVC for my photography workflow. I was using git-annex - but it was overly complex for what I needed, and git-lfs leaves way too many copies of lots of large files around on clients.

Each 'project' ends up being a git submodule that has DVC enabled, with a gdrive back-end for all of the images. All of the dvc and sidecar files are tracked in git (gitlab backing).

There is an over-arching git project that encloses all of the submodules, has the software tooling, and various other support bits and bobs.

I can clone that overarching repo on to any device (laptop, desktop, etc) and get my whole archive, then decide which projects to actually run a dvc pull on and work on at any given time.

liljenstolpe avatar Jul 03 '22 23:07 liljenstolpe

Hi @liljenstolpe , not to dismiss the request, but is there a reason for not running dvc install in the original repo being added as a submodule?

Anyhow, this would need to be handled in scmrepo:

https://github.com/iterative/scmrepo/blob/e339077325eba9c4076710615488c1e5944b4bdb/scmrepo/git/init.py#L112-L115

daavoo avatar Jul 04 '22 10:07 daavoo

Because the hooks in the original repo are stored in a .git/hooks file which isn't part of the actual repo (it's local context - i.e. you have to do it in all instances of the cloned repo). So, you have to do it after you add the submodule. However, submodule ".git" directories are actually in the parent's .git/modules/ directory. It seems as if the code has hard coded .git/hooks as the directory, which is the problem.

liljenstolpe avatar Jul 06 '22 02:07 liljenstolpe