dvc
dvc copied to clipboard
gc: fails when attempting to remove cache shared by multiple projects
Bug Report
Issue name
gc: fails when attempting to remove cache shared by multiple projects
Description
When attempting to garbage collect files shared by multiple projects dvc throws an error saying it is attempting to write a read only file.
Reproduce
I don't have multiple dvc repos to reproduce on
Expected
dvc performs gc as normal
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 2.9.5 (pip)
---------------------------------
Platform: Python 3.8.1 on Linux-5.17.5-76051705-generic-x86_64-with-glibc2.10
Supports:
azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.8.0),
webhdfs (fsspec = 2022.2.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.2.0, boto3 = 1.20.24)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/mapper/fastdata-fastlv
Caches: local
Remotes: local, s3, local
Workspace directory: xfs on /dev/mapper/fastdata-fastlv
Repo: dvc, git
Additional Information (if any):
$ dvc gc -v -w -p . ../../dcdanko/bdx2 ../../papciak/Biotia-DX/ ../../tpaisie/bdx/ ../../ahmadazim/Biotia-DX/ ../../hwells/Biotia-DX/
2022-08-03 11:31:02,132 WARNING: This will remove all cache except items used in the workspace of the current and the following repos:
- /mnt/fast/dev/dcdanko/bdx1
- /mnt/fast/dev/dcdanko/bdx2
- /mnt/fast/dev/papciak/Biotia-DX
- /mnt/fast/dev/tpaisie/bdx
- /mnt/fast/dev/ahmadazim/Biotia-DX
- /mnt/fast/dev/hwells/Biotia-DX
Are you sure you want to proceed? [y/n]: y
2022-08-03 11:31:04,244 ERROR: unexpected error - attempt to write a readonly database
------------------------------------------------------------
Traceback (most recent call last):
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/__init__.py", line 78, in main
ret = cmd.do_run()
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/commands/gc.py", line 51, in run
self.repo.gc(
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
return f(repo, *args, **kwargs)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/gc.py", line 53, in gc
all_repos = [Repo(path) for path in repos]
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/gc.py", line 53, in <listcomp>
all_repos = [Repo(path) for path in repos]
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/__init__.py", line 202, in __init__
self.state = State(self.root_dir, state_db_dir, self.dvcignore)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/state.py", line 65, in __init__
self.links = Cache(directory=os.path.join(tmp_dir, "links"), **config)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/diskcache/core.py", line 478, in __init__
self.reset(key, value, update=False)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/diskcache/core.py", line 2433, in reset
((old_value,),) = sql(
sqlite3.OperationalError: attempt to write a readonly database
------------------------------------------------------------
2022-08-03 11:31:05,655 DEBUG: Removing '/mnt/fast/dev/dcdanko/.RKoWhSFAMKbZQvEyT5Twwi.tmp'
2022-08-03 11:31:05,656 DEBUG: Removing '/mnt/fast/dev/dcdanko/.RKoWhSFAMKbZQvEyT5Twwi.tmp'
2022-08-03 11:31:05,656 DEBUG: Removing '/mnt/fast/dev/dcdanko/.RKoWhSFAMKbZQvEyT5Twwi.tmp'
2022-08-03 11:31:05,656 DEBUG: Removing '/fast/bdx/.shared_dvc_cache/.6xypQvximg96enbwqfa4tN.tmp'
2022-08-03 11:31:05,674 DEBUG: Version info for developers:
DVC version: 2.9.5 (pip)
---------------------------------
Platform: Python 3.8.1 on Linux-5.17.5-76051705-generic-x86_64-with-glibc2.10
Supports:
azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.8.0),
webhdfs (fsspec = 2022.2.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.2.0, boto3 = 1.20.24)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/mapper/fastdata-fastlv
Caches: local
Remotes: local, s3, local
Workspace directory: xfs on /dev/mapper/fastdata-fastlv
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-08-03 11:31:05,676 DEBUG: Analytics is disabled.
Any update on this?
Any update on this?
Sorry for the late response. I will take a look this week
Thanks, I just tried with dvc 2.18.1 and the error persists
I am having the same error while pulling from local backend.
The local backend is just a bucket that is being mounted as a local filesystem and is assigned for a different user, say myserviceuser. The user I am working from is a member of the myserviceuser group. The file premissions in the backend include recursive read&write access for group.
Read & write access for that backend directory is confirmed to be working fine. Still, dvc pull & dvc push result in the following error:
ERROR: unexpected error - attempt to write a readonly database
Using sudo dvc pull and sudo dvc push unblock the things but this method makes a huge mess with the file permissions being assigned for the root user.
$ dvc doctor
DVC version: 2.18.1 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.13.0-1031-aws-x86_64-with-glibc2.29
Supports:
http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
webhdfs (fsspec = 2022.7.1)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme1n1
Caches: local
Remotes: local
Workspace directory: ext4 on /dev/nvme1n1
Repo: dvc, git
@iddqdiddqd Could you try following this doc https://dvc.org/doc/user-guide/how-to/share-a-dvc-cache#how-to-share-a-dvc-cache and check if the issue persists with that setup?
@dcdanko I am not able to reproduce so far.
Could you share some more details on how you set up the projects? Even s a minimal example of 2 projects would be enough
I don't think there was anything special in how we set them up. I specified a shared cache and users used dvc add/pull to add or load files.
All of the copies of the repo are on the same xfs filesystem and we're using reflinks.
I'm not really sure how to create a minmal reproduction here, this feels like emergent behaviour. With the same repo on a different machine (also xfs, reflinks) I'm getting a lock error instead.
some similarities to @iddqdiddqd
- one of our remotes was an s3 bucket mounted as a filesystem using s3fuse.
- when I run
gcas sudo I get a different error: a lock error, no other dvc process is running
$ ps aux | grep dvc
root 1328 0.0 0.0 348860 7824 ? SNsl Aug03 0:01 s3fs biotia-dev /dvc_biotiadx /mnt/dvc_biotiadx -o rw,nonempty,allow_other,use_path_request_style,url=https://s3.wasabisys.com/,use_cache=/mnt/bulk/.dvc_biotiadx_cache,dev,suid
dcdanko 3509953 0.0 0.0 19048 2412 pts/3 RN+ 17:57 0:00 grep --color=auto dvc
$ sudo /home/dcdanko/miniconda/envs/bdx1/bin/dvc gc -v -w -p . ../../dcdanko/bdx2 ../../papciak/Biotia-DX/ ../../tpaisie/Biotia-DX/ ../../ahmadazim/Biotia-DX/ ../../hwells/Biotia-DX/
2022-08-18 17:54:28,310 WARNING: This will remove all cache except items used in the workspace of the current and the following repos:
- /mnt/fast/dev/dcdanko/bdx1
- /mnt/fast/dev/dcdanko/bdx2
- /mnt/fast/dev/papciak/Biotia-DX
- /mnt/fast/dev/tpaisie/Biotia-DX
- /mnt/fast/dev/ahmadazim/Biotia-DX
- /mnt/fast/dev/hwells/Biotia-DX
Are you sure you want to proceed? [y/n]: y
2022-08-18 17:54:32,404 ERROR: Unable to acquire lock. Most likely another DVC process is running or was terminated abruptly. Check the page <https://dvc.org/doc/user-guide/troubleshooting#lock-issue> for other possible reasons and to learn how to resolve this.
------------------------------------------------------------
Traceback (most recent call last):
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/zc/lockfile/__init__.py", line 59, in _lock_file
fcntl.flock(file.fileno(), _flags)
BlockingIOError: [Errno 11] Resource temporarily unavailable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/lock.py", line 116, in _do_lock
self._lock = zc.lockfile.LockFile(self._lockfile)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/zc/lockfile/__init__.py", line 117, in __init__
super(LockFile, self).__init__(path)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/zc/lockfile/__init__.py", line 90, in __init__
_lock_file(fp)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/zc/lockfile/__init__.py", line 61, in _lock_file
raise LockError("Couldn't lock %r" % file.name)
zc.lockfile.LockError: Couldn't lock '/mnt/fast/dev/dcdanko/bdx1/.dvc/tmp/lock'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/__init__.py", line 185, in main
ret = cmd.do_run()
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/commands/gc.py", line 68, in run
self.repo.gc(
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
return f(repo, *args, **kwargs)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/gc.py", line 73, in gc
stack.enter_context(repo.lock)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/contextlib.py", line 425, in enter_context
result = _cm_type.__enter__(cm)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/lock.py", line 142, in __enter__
self.lock()
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/lock.py", line 125, in lock
lock_retry()
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/funcy/decorators.py", line 45, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/funcy/flow.py", line 127, in retry
return call()
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/funcy/decorators.py", line 66, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/lock.py", line 119, in _do_lock
raise LockError(FAILED_TO_LOCK_MESSAGE)
dvc.lock.LockError: Unable to acquire lock. Most likely another DVC process is running or was terminated abruptly. Check the page <https://dvc.org/doc/user-guide/troubleshooting#lock-issue> for other possible reasons and to learn how to resolve this.
------------------------------------------------------------
2022-08-18 17:54:32,409 DEBUG: Analytics is disabled.
@dcdanko thank you! I've followed the steps from the doc you shared and have set up a separate caching directory.
On top of that, it was required to adjust permissions for GID inheritance (chmod u=rwx,g=rwx,o=,g+s ~/dvc-cache/) and use dvc config cache.type copy so that the files can be editable within my setup. My issue is resolved now.
UPD: I am sorry, meant to tag @daavoo
@daavoo unfortunately I still am having this issue, any more info I can provide?