clearml-agent icon indicating copy to clipboard operation
clearml-agent copied to clipboard

Problems with Public key and ssh

Open 212792736 opened this issue 4 years ago • 13 comments

Hi all,

I think I have a related problem as it is related to ssh and trains. I have set up on the local PC trains-agent which is able to connect to the remote PC running trains. When I try to run the code that is running well on the local machine (git clone and running the code works), but when I "throw" the job via trains-agent daemon to the trains server, I get the following error (...)

[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

(...)

I can only use ssh for cloning the repo from the company git repository. However, that requires a combination of adding public key to the ssh agent. THe public key is stored in the ~/.ssh/id_rsa.pub and that same file is used when I use the local git commands.

I've read in the instructions and in some of the closed issues that I should set the follwing in the trains.conf:

agent {
    # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
    # leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
    # git_user=""
    # git_pass=""

    # Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
    force_git_ssh_protocol: True

I'm a bit confused on why there is an error. I thought that the "id_rsa.pub" file is fetched from the local PC where the trains-agent daemon is running, but that maybe it's taking the one from the remote PC. So, I did try having the same id_rsa.pub files in ~/.ssh, on both machines but I get the same error, so I'm doubting if that is the problem. Maybe you know more?

I'm running the trains agent like: trains-agent daemon --gpus 0,1 --queue default --git-user USERNAME --git-pass PASSWORD

212792736 avatar Oct 27 '20 13:10 212792736

I'm a bit confused on why there is an error. I thought that the "id_rsa.pub" file is fetched from the local PC where the trains-agent daemon is running, but that maybe it's taking the one from the remote PC

What remote PC are you referring to? In our terminology, "remote" is usually where the trains-agent is running

jkhenning avatar Oct 28 '20 12:10 jkhenning

I'm running the trains-agent on my laptop where I develop most of the things, and then I "push" the job onto our remote server which is running trains that can "fetch" the jobs. So I have the trains-agent running "locally" meaning just that it is not ran on the same machine where the trains server is running. Maybe it is clearer now?

212792736 avatar Oct 28 '20 13:10 212792736

Hi @212792736

I'll try to clarify , we should have two files ~/.ssh/id_rsa.pub one on local machine one on the remote machine. You have to make sure you put the correct ~/.ssh/id_rsa.pub on both machines (trains does not do that for you).

Once you have the SSH key on the remote machine, trains-agent will be using it to clone the repository. If you want to test it, on the remote machine try to run git clone yourself, if you manage to clone your repository, trains-agent will be able to do the same :)

As far as the trains.conf configuration, it seems you have the right one :)

bmartinn avatar Oct 29 '20 01:10 bmartinn

HI again,

So I deleted all the id_rsa files, and then on the local WIndows machine where I am running trains-agent I made a new id_rsa, went in git bash and successfully cloned a full repository. Then I copied that same id_rsa file to the Ubuntu machine that has all the GPUs and trains server docker installed and up and running.

This is the log I get:

(...)
Pass

cloning: [email protected]:212792736/CVUSInference.git

1603964807984 WINDOWSMACHINE:gpu0,1 DEBUG Warning: Permanently added the ECDSA host key for IP address 'git_server_ip_address' to the list of known hosts.

[email protected]: Permission denied (publickey).

fatal: Could not read from remote repository.
(...)

Is this the correct way of using the trains-agent? The idea is to have several different user accounts (trains-agents) that can be ran on different machines which are not the machine where the trains server is running.

212792736 avatar Oct 29 '20 10:10 212792736

@212792736, can you clone the repository using git in the Ubuntu machine after copying the id_rsa file there?

jkhenning avatar Oct 29 '20 10:10 jkhenning

Yes, I did need to run the chmod 400 ~/.ssh/id_rsa due to It is required that your private key files are NOT accessible by others. Copying that one to the windows machine and running trains-agent again resulted in the same public key error

212792736 avatar Oct 29 '20 11:10 212792736

@212792736 in the remote machine (the one running trains-agent) see if you can clone the specific git repository.

git clone [email protected]:212792736/CVUSInference.git

If you can do that (meaning, without providing any password etc.), the trains-agent will be able to do the same :) (Basically the trains-agent will be running the git clone command, just like you do) Make sense ?

p.s. Obviously in order for your manual git clone to work, you will have to make sure the SSH keys are correct, and in the right place.

bmartinn avatar Oct 30 '20 23:10 bmartinn

Hi again! First off, thanks for the support, you guys are doing a great job providing this tool and also the support!

Coming back to the issue at hand, when I want to run git clone [email protected]:212792736/CVUSInference.git on the remote machine that is running the trains-agent I need to provide the username and password for the id_ras.pub key that I generated before. I cannot run the the git clone without providing the credentials (I run the ssh-agent in git bash which is handling these requests). I thought that I can provide the credentials by calling: trains-agent daemon --gpus 0,1 --queue default --git-user USERNAME --git-pass PASSWORD ?

212792736 avatar Nov 02 '20 08:11 212792736

I thought that I can provide the credentials by calling:

Hmm I vaguely remember there might be a bug in the command line user/pass. Could you try configuring the ~/trains.conf with your user/pass (here)

bmartinn avatar Nov 03 '20 00:11 bmartinn

When I try to put the git username and pass in the trains.conf, I get a different response even though

# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
force_git_ssh_protocol: true

Output: Using user/pass credentials - replacing ssh url '[email protected]' with https url 'https://github.COMPANY.git'

The following results in an error as it is trying to access https link via proxy which is not working (I was trying to go around and set NO_PROXY environment for the docker container on Ubuntu to check if that would work, but I was unable to do so at this time,

212792736 avatar Nov 03 '20 13:11 212792736

Hi @212792736 I think I know fully understand the issue. If user/pass are provided trains-agent will pull the repository over https (which as I understand, doe snot work in your case). If you are forcing SSH, your ssh key is encrypted and is looking for user/pass to unencrypted it, but trains-agent is not aware of this feature, so it fails on wrong credentials. Am I missing anything?

Regrading using http / proxy issue, from here I would recommend trying to edit the ~/.gitconfig on the host machine (the one running the trains-agent) with the following configuration:

[http]
[http "https://github.company.com"]
	proxy = http://proxyUsername:[email protected]:port
	sslVerify = false

Notice there is no need to worry about mapping the ~/.gitconfig file into the docker trains-agent takes care of that :)

EDIT: You could also generate ssh credentials that do not need user/pass decryption to use... Since you would have to store both on the same machine, it is not less secure anyhow :)

bmartinn avatar Nov 04 '20 20:11 bmartinn

EDIT: You could also generate ssh credentials that do not need user/pass decryption to use... Since you would have to store both on the same machine, it is not less secure anyhow :)

I managed to get it working with this option! Now I'm struggling to let docker on Ubuntu to use proxy settings ... I get the NewConnection error of not being able to pip install anything, I'll see how to fix that. Maybe you know a quick way?

specifically the error :
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/ ERROR: Could not find a version that satisfies the requirement numpy==1.17.4 (from versions: none) ERROR: No matching distribution found for numpy==1.17.4

212792736 avatar Nov 09 '20 07:11 212792736

Hi @212792736

I managed to get it working with this option!

nice!

Yes, proxy configuration for pip / python is always funny this way. Basically you need to set a system wide proxy configuration for all of python (from pip to trains) In your trains.conf file, add the following:

agent.extra_docker_arguments = ["-e", "http_proxy=http://[username:password@]proxyserver:port"]

Configuration line here, detials on pip here

This will define the environment variable http_proxy which should solve the system wide proxy definition. (if this is not http proxy but an https proxy, then you need to define https_proxy)

bmartinn avatar Nov 10 '20 19:11 bmartinn