dvc icon indicating copy to clipboard operation
dvc copied to clipboard

gc: fails when attempting to remove cache shared by multiple projects

Open dcdanko opened this issue 3 years ago • 10 comments

Bug Report

Issue name

gc: fails when attempting to remove cache shared by multiple projects

Description

When attempting to garbage collect files shared by multiple projects dvc throws an error saying it is attempting to write a read only file.

Reproduce

I don't have multiple dvc repos to reproduce on

Expected

dvc performs gc as normal

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.9.5 (pip)
---------------------------------
Platform: Python 3.8.1 on Linux-5.17.5-76051705-generic-x86_64-with-glibc2.10
Supports:
	azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.8.0),
	webhdfs (fsspec = 2022.2.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2022.2.0, boto3 = 1.20.24)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/mapper/fastdata-fastlv
Caches: local
Remotes: local, s3, local
Workspace directory: xfs on /dev/mapper/fastdata-fastlv
Repo: dvc, git

Additional Information (if any):

$ dvc gc -v -w -p . ../../dcdanko/bdx2 ../../papciak/Biotia-DX/ ../../tpaisie/bdx/ ../../ahmadazim/Biotia-DX/ ../../hwells/Biotia-DX/
2022-08-03 11:31:02,132 WARNING: This will remove all cache except items used in the workspace of the current and the following repos:
  - /mnt/fast/dev/dcdanko/bdx1
  - /mnt/fast/dev/dcdanko/bdx2
  - /mnt/fast/dev/papciak/Biotia-DX
  - /mnt/fast/dev/tpaisie/bdx
  - /mnt/fast/dev/ahmadazim/Biotia-DX
  - /mnt/fast/dev/hwells/Biotia-DX
Are you sure you want to proceed? [y/n]: y
2022-08-03 11:31:04,244 ERROR: unexpected error - attempt to write a readonly database
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/__init__.py", line 78, in main
    ret = cmd.do_run()
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/commands/gc.py", line 51, in run
    self.repo.gc(
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/gc.py", line 53, in gc
    all_repos = [Repo(path) for path in repos]
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/gc.py", line 53, in <listcomp>
    all_repos = [Repo(path) for path in repos]
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/__init__.py", line 202, in __init__
    self.state = State(self.root_dir, state_db_dir, self.dvcignore)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/state.py", line 65, in __init__
    self.links = Cache(directory=os.path.join(tmp_dir, "links"), **config)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/diskcache/core.py", line 478, in __init__
    self.reset(key, value, update=False)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/diskcache/core.py", line 2433, in reset
    ((old_value,),) = sql(
sqlite3.OperationalError: attempt to write a readonly database
------------------------------------------------------------
2022-08-03 11:31:05,655 DEBUG: Removing '/mnt/fast/dev/dcdanko/.RKoWhSFAMKbZQvEyT5Twwi.tmp'
2022-08-03 11:31:05,656 DEBUG: Removing '/mnt/fast/dev/dcdanko/.RKoWhSFAMKbZQvEyT5Twwi.tmp'
2022-08-03 11:31:05,656 DEBUG: Removing '/mnt/fast/dev/dcdanko/.RKoWhSFAMKbZQvEyT5Twwi.tmp'
2022-08-03 11:31:05,656 DEBUG: Removing '/fast/bdx/.shared_dvc_cache/.6xypQvximg96enbwqfa4tN.tmp'
2022-08-03 11:31:05,674 DEBUG: Version info for developers:
DVC version: 2.9.5 (pip)
---------------------------------
Platform: Python 3.8.1 on Linux-5.17.5-76051705-generic-x86_64-with-glibc2.10
Supports:
	azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.8.0),
	webhdfs (fsspec = 2022.2.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2022.2.0, boto3 = 1.20.24)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/mapper/fastdata-fastlv
Caches: local
Remotes: local, s3, local
Workspace directory: xfs on /dev/mapper/fastdata-fastlv
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-08-03 11:31:05,676 DEBUG: Analytics is disabled.

dcdanko avatar Aug 03 '22 15:08 dcdanko

Any update on this?

dcdanko avatar Aug 17 '22 18:08 dcdanko

Any update on this?

Sorry for the late response. I will take a look this week

daavoo avatar Aug 17 '22 18:08 daavoo

Thanks, I just tried with dvc 2.18.1 and the error persists

dcdanko avatar Aug 17 '22 18:08 dcdanko

I am having the same error while pulling from local backend.

The local backend is just a bucket that is being mounted as a local filesystem and is assigned for a different user, say myserviceuser. The user I am working from is a member of the myserviceuser group. The file premissions in the backend include recursive read&write access for group.

Read & write access for that backend directory is confirmed to be working fine. Still, dvc pull & dvc push result in the following error:

ERROR: unexpected error - attempt to write a readonly database

Using sudo dvc pull and sudo dvc push unblock the things but this method makes a huge mess with the file permissions being assigned for the root user.

$ dvc doctor
DVC version: 2.18.1 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.13.0-1031-aws-x86_64-with-glibc2.29
Supports:
        http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        webhdfs (fsspec = 2022.7.1)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme1n1
Caches: local
Remotes: local
Workspace directory: ext4 on /dev/nvme1n1
Repo: dvc, git

john-terraform avatar Aug 18 '22 08:08 john-terraform

@iddqdiddqd Could you try following this doc https://dvc.org/doc/user-guide/how-to/share-a-dvc-cache#how-to-share-a-dvc-cache and check if the issue persists with that setup?

daavoo avatar Aug 18 '22 11:08 daavoo

@dcdanko I am not able to reproduce so far.

Could you share some more details on how you set up the projects? Even s a minimal example of 2 projects would be enough

daavoo avatar Aug 18 '22 11:08 daavoo

I don't think there was anything special in how we set them up. I specified a shared cache and users used dvc add/pull to add or load files.

All of the copies of the repo are on the same xfs filesystem and we're using reflinks.

I'm not really sure how to create a minmal reproduction here, this feels like emergent behaviour. With the same repo on a different machine (also xfs, reflinks) I'm getting a lock error instead.

dcdanko avatar Aug 18 '22 21:08 dcdanko

some similarities to @iddqdiddqd

  • one of our remotes was an s3 bucket mounted as a filesystem using s3fuse.
  • when I run gc as sudo I get a different error: a lock error, no other dvc process is running
$ ps aux | grep dvc
root        1328  0.0  0.0 348860  7824 ?        SNsl Aug03   0:01 s3fs biotia-dev /dvc_biotiadx /mnt/dvc_biotiadx -o rw,nonempty,allow_other,use_path_request_style,url=https://s3.wasabisys.com/,use_cache=/mnt/bulk/.dvc_biotiadx_cache,dev,suid
dcdanko  3509953  0.0  0.0  19048  2412 pts/3    RN+  17:57   0:00 grep --color=auto dvc
$ sudo /home/dcdanko/miniconda/envs/bdx1/bin/dvc gc -v -w -p . ../../dcdanko/bdx2 ../../papciak/Biotia-DX/ ../../tpaisie/Biotia-DX/ ../../ahmadazim/Biotia-DX/ ../../hwells/Biotia-DX/
2022-08-18 17:54:28,310 WARNING: This will remove all cache except items used in the workspace of the current and the following repos:
  - /mnt/fast/dev/dcdanko/bdx1
  - /mnt/fast/dev/dcdanko/bdx2
  - /mnt/fast/dev/papciak/Biotia-DX
  - /mnt/fast/dev/tpaisie/Biotia-DX
  - /mnt/fast/dev/ahmadazim/Biotia-DX
  - /mnt/fast/dev/hwells/Biotia-DX
Are you sure you want to proceed? [y/n]: y
2022-08-18 17:54:32,404 ERROR: Unable to acquire lock. Most likely another DVC process is running or was terminated abruptly. Check the page <https://dvc.org/doc/user-guide/troubleshooting#lock-issue> for other possible reasons and to learn how to resolve this.
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/zc/lockfile/__init__.py", line 59, in _lock_file
    fcntl.flock(file.fileno(), _flags)
BlockingIOError: [Errno 11] Resource temporarily unavailable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/lock.py", line 116, in _do_lock
    self._lock = zc.lockfile.LockFile(self._lockfile)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/zc/lockfile/__init__.py", line 117, in __init__
    super(LockFile, self).__init__(path)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/zc/lockfile/__init__.py", line 90, in __init__
    _lock_file(fp)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/zc/lockfile/__init__.py", line 61, in _lock_file
    raise LockError("Couldn't lock %r" % file.name)
zc.lockfile.LockError: Couldn't lock '/mnt/fast/dev/dcdanko/bdx1/.dvc/tmp/lock'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/commands/gc.py", line 68, in run
    self.repo.gc(
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/gc.py", line 73, in gc
    stack.enter_context(repo.lock)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/contextlib.py", line 425, in enter_context
    result = _cm_type.__enter__(cm)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/lock.py", line 142, in __enter__
    self.lock()
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/lock.py", line 125, in lock
    lock_retry()
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/funcy/flow.py", line 127, in retry
    return call()
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/lock.py", line 119, in _do_lock
    raise LockError(FAILED_TO_LOCK_MESSAGE)
dvc.lock.LockError: Unable to acquire lock. Most likely another DVC process is running or was terminated abruptly. Check the page <https://dvc.org/doc/user-guide/troubleshooting#lock-issue> for other possible reasons and to learn how to resolve this.
------------------------------------------------------------
2022-08-18 17:54:32,409 DEBUG: Analytics is disabled.

dcdanko avatar Aug 18 '22 21:08 dcdanko

@dcdanko thank you! I've followed the steps from the doc you shared and have set up a separate caching directory.

On top of that, it was required to adjust permissions for GID inheritance (chmod u=rwx,g=rwx,o=,g+s ~/dvc-cache/) and use dvc config cache.type copy so that the files can be editable within my setup. My issue is resolved now.

UPD: I am sorry, meant to tag @daavoo

john-terraform avatar Aug 19 '22 12:08 john-terraform

@daavoo unfortunately I still am having this issue, any more info I can provide?

dcdanko avatar Sep 15 '22 12:09 dcdanko