dvc icon indicating copy to clipboard operation
dvc copied to clipboard

`dvc pull`: Config field `jobs` in `.dvc/config` not taken into account as expected

Open stefan-hartmann-lgs opened this issue 3 years ago • 9 comments

Bug Report

dvc pull: Config field jobs in .dvc/config not taken into account as expected

Description

dvc pull is still using cpu_count() * 4 download jobs even if jobs=4 is defined under the remote in use in .dvc/config.

Furthermore, the jobs=4 still seems to have an effect, as the dvc pull otherwise often fails with:

image

With the jobs=4 config that problem seems to be history so it seems to have an effect somehow.

Maybe this is not really clear from the documentation!?

Reproduce

  1. dvc init
  2. Edit .dvc/config file and add a remote - for instance:
[core]
    remote = artifactory
['remote "artifactory"']
    url = https://my.artifactory.com/...
    auth = basic
    method = PUT
    jobs = 4
  1. Run dvc pull
  2. See that more than 4 download jobs are running in parallel

Expected

It was expected that the jobs=4 config in .dvc/config file does the same as running dvc pull --jobs 4 which is not the case.

Environment information

DVC 2.10.2 on Windows.

Output of dvc doctor:

$ dvc doctor

DVC version: 2.10.2 (exe)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19042-SP0
Supports:
        azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
        gdrive (pydrive2 = 1.10.1),
        gs (gcsfs = 2022.3.0),
        hdfs (fsspec = 2022.3.0, pyarrow = 8.0.0),
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
        ssh (sshfs = 2022.3.1),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.7),
        webdavs (webdav4 = 0.9.7)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: NTFS on C:\
Repo: dvc, git

stefan-hartmann-lgs avatar May 11 '22 13:05 stefan-hartmann-lgs

This is not currently supported (see available config fields for remote section in https://dvc.org/doc/command-reference/config#remote).

Is there any particular part of the documentation that led you to:

It was expected that the jobs=4 config in .dvc/config file does the same as running dvc pull --jobs 4 which is not the case.

daavoo avatar May 13 '22 09:05 daavoo

@daavoo That links to https://dvc.org/doc/command-reference/remote/modify#available-parameters-for-all-remotes, which does show jobs as an option. Am I misunderstanding?

dberenbaum avatar May 13 '22 15:05 dberenbaum

🙏 ignore muy previous comment

daavoo avatar May 13 '22 17:05 daavoo

@daavoo That links to https://dvc.org/doc/command-reference/remote/modify#available-parameters-for-all-remotes, which does show jobs as an option. Am I misunderstanding?

Yes exactly. The jobs in .dvc/config does something but not the same what --jobs does an that was expected as stated in the documentation.

stefan-hartmann-lgs avatar May 16 '22 12:05 stefan-hartmann-lgs

Hi @stefan-hartmann-lgs ! How are you checking the See that more than 4 download jobs are running in parallel??

I have tried and the jobs config section is honored when instantiating the filesystem and passed down to the transferring task :

https://github.com/iterative/dvc/blob/f23d31af644ab4ad4492b9cfa1000d58420c238d/dvc/fs/init.py#L94-L110

Could you try sharing the output profile from:

pip install viztracer
dvc pull --viztracer-depth 8

daavoo avatar May 19 '22 06:05 daavoo

Hi @daavoo

Sorry for laaaaate reply - I can see that more than 4 jobs are running in my console (even if jobs=4 in .dvc/config).

As stated above I would have expected that only 4 jobs are running but I see 32 running (see screenshot below)

image

stefan-hartmann-lgs avatar Jun 27 '22 12:06 stefan-hartmann-lgs

It's better to use dvc config -l instead of type type .dvc/config. for there might be some hidden things in .dvc/config.local

karajan1001 avatar Jul 02 '22 03:07 karajan1001

@daavoo any infos on this one?

stefan-hartmann-lgs avatar Aug 24 '22 07:08 stefan-hartmann-lgs

Ping @daavoo

dberenbaum avatar Aug 30 '22 01:08 dberenbaum