Individual remote config changes across different machines
Report
I have a remote that I access over network via SFTP. This remote is situated in a GPU server. The idea is to have a few team members push datasets to the GPU server as remote and then I pull the datasets as needs on the GPU server to the run experiments.
While for my team members, the config file will point out the GPU server, port and password, on the GPU server my remote would be a local folder.
Since we all are going to share the same repo on git, how do I configure specific remotes for each machine in the same git repo?
One potential solution I've seen to is to have the default remote from the team members in the git repo pointing to the network location whereas I would add a git update-index --skip-worktree [<file>...] command but just for the config file on the GPU server, allowing me to pull any git changes without having to worry about my remote overriding in the config file on the GPU server.
Is this recommended or are there any better solutions?
@narbhar --local option work for you in this case?
With this option you could override any remote and its settings in the .dvc/config.
Or you could use dvc remote add --system or dvc remote add --global on all machines.
A bit more advanced: in this case, on the GPU machine you could even use that remote storage as a cache- dvc cache dir, turn on symlinks, reflinks, etc (https://dvc.org/doc/user-guide/large-dataset-optimization) and avoid copying data at all.
@shcheklein My main concern is since this repo is shared by multiple team members, pushing any changes from the server would affect others' config file whenever they are overridden and they pull it to their system.
I currently use the local option to store the password for the connection made to the server instead of pushing it to git. Is there a way to make DVC prioritize config.local over config just for the server? This way I don't need to worry about modifying the git tracked config file while config.local would be untracked.
I'm temporarily using git skiptree to workaround it. It does seem to work, but I'd like to know about a solution from DVC side if available
DVC prioritize config.local over config just for the serve
hmm, it is prioritizing the local config over config. What do you mean by "just for the server", could you clarify please?
What I meant was that the parameters in config file for accessing the remote folder is different across different machines. Since the repo is being shared by a bunch of team members, the global file contains config (ssh into GPU server) that is very similar across all these machines, however since the GPU server, which is hosting the remote, also needs to pull and push datasets, the config would be different for it, with it being a local remote. However, since the repo is tracked on git, the global config is always synced across machines, which would override the any GPU server specific config.
If I tried to add a local config on the GPU server, it seems to give a error: ERROR: configuration error - config file error: extra keys not allowed @ data['remote']['remote_storage']['user'] since global config is also present whenever I sync my repo.
To prevent this, I used git skiptree command to not track the config file on just the GPU server and modified it to contain the appropriate local folder as remote. I was wondering if DVC has a way to manage these kinds of situations where the remote also participates in pulling and pushing via DVC.
ERROR: configuration error - config file error: extra keys not allowed @ data['remote']['remote_storage']['user']
I hope there should be a simple fix to this. Could you show (hide the values) your configs please?
How about adding a new remote on the server and setting it to the default. Of course, you should add all these configs as --local ones.
And ERROR: configuration error - config file error: extra keys not allowed @ data['remote']['remote_storage']['user']
I guess the remote remote_storage didn't have a user setting.
closing as stale