clearml-agent
clearml-agent copied to clipboard
Problems with Public key and ssh
Hi all,
I think I have a related problem as it is related to ssh and trains. I have set up on the local PC trains-agent which is able to connect to the remote PC running trains. When I try to run the code that is running well on the local machine (git clone and running the code works), but when I "throw" the job via trains-agent daemon to the trains server, I get the following error (...)
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
(...)
I can only use ssh for cloning the repo from the company git repository. However, that requires a combination of adding public key to the ssh agent. THe public key is stored in the ~/.ssh/id_rsa.pub
and that same file is used when I use the local git commands.
I've read in the instructions and in some of the closed issues that I should set the follwing in the trains.conf:
agent {
# Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
# leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
# git_user=""
# git_pass=""
# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
force_git_ssh_protocol: True
I'm a bit confused on why there is an error. I thought that the "id_rsa.pub" file is fetched from the local PC where the trains-agent daemon is running, but that maybe it's taking the one from the remote PC. So, I did try having the same id_rsa.pub files in ~/.ssh, on both machines but I get the same error, so I'm doubting if that is the problem. Maybe you know more?
I'm running the trains agent like:
trains-agent daemon --gpus 0,1 --queue default --git-user USERNAME --git-pass PASSWORD
I'm a bit confused on why there is an error. I thought that the "id_rsa.pub" file is fetched from the local PC where the trains-agent daemon is running, but that maybe it's taking the one from the remote PC
What remote PC are you referring to? In our terminology, "remote" is usually where the trains-agent is running
I'm running the trains-agent on my laptop where I develop most of the things, and then I "push" the job onto our remote server which is running trains that can "fetch" the jobs. So I have the trains-agent running "locally" meaning just that it is not ran on the same machine where the trains server is running. Maybe it is clearer now?
Hi @212792736
I'll try to clarify , we should have two files ~/.ssh/id_rsa.pub
one on local machine one on the remote machine.
You have to make sure you put the correct ~/.ssh/id_rsa.pub
on both machines (trains
does not do that for you).
Once you have the SSH key on the remote machine, trains-agent
will be using it to clone the repository.
If you want to test it, on the remote machine try to run git clone
yourself, if you manage to clone your repository, trains-agent
will be able to do the same :)
As far as the trains.conf
configuration, it seems you have the right one :)
HI again,
So I deleted all the id_rsa files, and then on the local WIndows machine where I am running trains-agent I made a new id_rsa, went in git bash and successfully cloned a full repository. Then I copied that same id_rsa file to the Ubuntu machine that has all the GPUs and trains server docker installed and up and running.
This is the log I get:
(...)
Pass
cloning: [email protected]:212792736/CVUSInference.git
1603964807984 WINDOWSMACHINE:gpu0,1 DEBUG Warning: Permanently added the ECDSA host key for IP address 'git_server_ip_address' to the list of known hosts.
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.
(...)
Is this the correct way of using the trains-agent? The idea is to have several different user accounts (trains-agents) that can be ran on different machines which are not the machine where the trains server is running.
@212792736, can you clone the repository using git in the Ubuntu machine after copying the id_rsa file there?
Yes, I did need to run the chmod 400 ~/.ssh/id_rsa
due to It is required that your private key files are NOT accessible by others.
Copying that one to the windows machine and running trains-agent again resulted in the same public key error
@212792736 in the remote machine (the one running trains-agent
) see if you can clone the specific git repository.
git clone [email protected]:212792736/CVUSInference.git
If you can do that (meaning, without providing any password etc.), the trains-agent
will be able to do the same :)
(Basically the trains-agent will be running the git clone command, just like you do)
Make sense ?
p.s.
Obviously in order for your manual git clone
to work, you will have to make sure the SSH keys are correct, and in the right place.
Hi again! First off, thanks for the support, you guys are doing a great job providing this tool and also the support!
Coming back to the issue at hand, when I want to run
git clone [email protected]:212792736/CVUSInference.git
on the remote machine that is running the trains-agent
I need to provide the username and password for the id_ras.pub key that I generated before. I cannot run the the git clone without providing the credentials (I run the ssh-agent in git bash which is handling these requests). I thought that I can provide the credentials by calling:
trains-agent daemon --gpus 0,1 --queue default --git-user USERNAME --git-pass PASSWORD
?
I thought that I can provide the credentials by calling:
Hmm I vaguely remember there might be a bug in the command line user/pass.
Could you try configuring the ~/trains.conf
with your user/pass (here)
When I try to put the git username and pass in the trains.conf, I get a different response even though
# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
force_git_ssh_protocol: true
Output:
Using user/pass credentials - replacing ssh url '[email protected]' with https url 'https://github.COMPANY.git'
The following results in an error as it is trying to access https link via proxy which is not working (I was trying to go around and set NO_PROXY environment for the docker container on Ubuntu to check if that would work, but I was unable to do so at this time,
Hi @212792736 I think I know fully understand the issue.
If user/pass are provided trains-agent
will pull the repository over https (which as I understand, doe snot work in your case).
If you are forcing SSH, your ssh key is encrypted and is looking for user/pass to unencrypted it, but trains-agent
is not aware of this feature, so it fails on wrong credentials.
Am I missing anything?
Regrading using http / proxy issue, from here I would recommend trying to edit the ~/.gitconfig
on the host machine (the one running the trains-agent
) with the following configuration:
[http]
[http "https://github.company.com"]
proxy = http://proxyUsername:[email protected]:port
sslVerify = false
Notice there is no need to worry about mapping the ~/.gitconfig
file into the docker trains-agent
takes care of that :)
EDIT: You could also generate ssh credentials that do not need user/pass decryption to use... Since you would have to store both on the same machine, it is not less secure anyhow :)
EDIT: You could also generate ssh credentials that do not need user/pass decryption to use... Since you would have to store both on the same machine, it is not less secure anyhow :)
I managed to get it working with this option! Now I'm struggling to let docker on Ubuntu to use proxy settings ... I get the NewConnection error of not being able to pip install anything, I'll see how to fix that. Maybe you know a quick way?
specifically the error :
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/ ERROR: Could not find a version that satisfies the requirement numpy==1.17.4 (from versions: none) ERROR: No matching distribution found for numpy==1.17.4
Hi @212792736
I managed to get it working with this option!
nice!
Yes, proxy configuration for pip / python is always funny this way.
Basically you need to set a system wide proxy configuration for all of python (from pip to trains)
In your trains.conf
file, add the following:
agent.extra_docker_arguments = ["-e", "http_proxy=http://[username:password@]proxyserver:port"]
Configuration line here, detials on pip here
This will define the environment variable http_proxy
which should solve the system wide proxy definition.
(if this is not http
proxy but an https
proxy, then you need to define https_proxy
)