dvc icon indicating copy to clipboard operation
dvc copied to clipboard

dvc commit takes too long

Open kohei-kawaguchi opened this issue 2 years ago • 4 comments

Bug Report

commit: takes too long

Description

Running dvc commit takes too long. It can take more than an hour. It still takes a long time even if I run dvc commit immediately after the previous dvc commit is finished, without any change in tracked files.

Reproduce

  1. git pull
  2. dvc pull
  3. dvc commit

Expected

I expect that dvc commit does not take long when there are few file changes or no change.

Environment information

Output of dvc doctor:

$dvc doctor
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.9.13 on Windows-10-10.0.22000-SP0
Supports:
        webhdfs (fsspec = 2022.5.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.5.0, boto3 = 1.21.21)

Additional Information (if any):

I use Microsoft Windows 11 Home 10.0.22000 Build 22000.

The project folder and the local repository are in a network mount. I recently changed the laptop and it changed the drive letter from E to D. Because this could be the reason, I tried ii) changing the drive letter from D to E, ii) removing the existing folder with the local repository and newly git pull/dvc pull to create the folder, and ii) setting state.dir and index.dir at the C drive. Nevertheless, I encounter the same problem.

I have the same problem with multiple projects on the same device.

kohei-kawaguchi avatar Jun 29 '22 08:06 kohei-kawaguchi

For reference, the issue appears to be related to slow copyfile on checkout, the read takes 89% of the runtime for copyfile, the actual write is 7%

Screen Shot 2022-06-30 at 1 05 54 PM Screen Shot 2022-06-30 at 1 05 42 PM Screen Shot 2022-06-30 at 1 05 24 PM

(cprof report available in support email chain)

pmrowla avatar Jun 30 '22 04:06 pmrowla

Is the cache type copy? If it is, I'd say the issue is with unnecessary relinking.

skshetry avatar Jun 30 '22 04:06 skshetry

The cache type priority is the default reflink,copy. But copy will be effective because I use Windows.

kohei-kawaguchi avatar Jul 05 '22 09:07 kohei-kawaguchi

How can I stop the unnecessary relinking?

kohei-kawaguchi avatar Jul 06 '22 00:07 kohei-kawaguchi