dvc
dvc copied to clipboard
`dvc pull`: Config field `jobs` in `.dvc/config` not taken into account as expected
Bug Report
dvc pull: Config field jobs in .dvc/config not taken into account as expected
Description
dvc pull is still using cpu_count() * 4 download jobs even if jobs=4 is defined under the remote in use in .dvc/config.
Furthermore, the jobs=4 still seems to have an effect, as the dvc pull otherwise often fails with:

With the jobs=4 config that problem seems to be history so it seems to have an effect somehow.
Maybe this is not really clear from the documentation!?
Reproduce
dvc init- Edit
.dvc/configfile and add a remote - for instance:
[core]
remote = artifactory
['remote "artifactory"']
url = https://my.artifactory.com/...
auth = basic
method = PUT
jobs = 4
- Run
dvc pull - See that more than 4 download jobs are running in parallel
Expected
It was expected that the jobs=4 config in .dvc/config file does the same as running dvc pull --jobs 4 which is not the case.
Environment information
DVC 2.10.2 on Windows.
Output of dvc doctor:
$ dvc doctor
DVC version: 2.10.2 (exe)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19042-SP0
Supports:
azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
gdrive (pydrive2 = 1.10.1),
gs (gcsfs = 2022.3.0),
hdfs (fsspec = 2022.3.0, pyarrow = 8.0.0),
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
ssh (sshfs = 2022.3.1),
oss (ossfs = 2021.8.0),
webdav (webdav4 = 0.9.7),
webdavs (webdav4 = 0.9.7)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: NTFS on C:\
Repo: dvc, git
This is not currently supported (see available config fields for remote section in https://dvc.org/doc/command-reference/config#remote).
Is there any particular part of the documentation that led you to:
It was expected that the jobs=4 config in .dvc/config file does the same as running dvc pull --jobs 4 which is not the case.
@daavoo That links to https://dvc.org/doc/command-reference/remote/modify#available-parameters-for-all-remotes, which does show jobs as an option. Am I misunderstanding?
🙏 ignore muy previous comment
@daavoo That links to https://dvc.org/doc/command-reference/remote/modify#available-parameters-for-all-remotes, which does show
jobsas an option. Am I misunderstanding?
Yes exactly. The jobs in .dvc/config does something but not the same what --jobs does an that was expected as stated in the documentation.
Hi @stefan-hartmann-lgs ! How are you checking the See that more than 4 download jobs are running in parallel??
I have tried and the jobs config section is honored when instantiating the filesystem and passed down to the transferring task :
https://github.com/iterative/dvc/blob/f23d31af644ab4ad4492b9cfa1000d58420c238d/dvc/fs/init.py#L94-L110
Could you try sharing the output profile from:
pip install viztracer
dvc pull --viztracer-depth 8
Hi @daavoo
Sorry for laaaaate reply - I can see that more than 4 jobs are running in my console (even if jobs=4 in .dvc/config).
As stated above I would have expected that only 4 jobs are running but I see 32 running (see screenshot below)

It's better to use dvc config -l instead of type type .dvc/config. for there might be some hidden things in .dvc/config.local
@daavoo any infos on this one?
Ping @daavoo