clearml-session icon indicating copy to clipboard operation
clearml-session copied to clipboard

Agent fails to install SSH server when running in venv/Conda

Open norrishd opened this issue 3 years ago • 5 comments

I've followed fairly straightforward steps to install a ClearML agent and connect to it using clearml-session, but get the following output:

Installing SSH Server on ip-172-31-4-42 [172.31.4.42]
Unable to load host key "/home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_rsa_key.pub": invalid format
Unable to load host key: /home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_rsa_key.pub
Unable to load host key "/home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ecdsa_key.pub": invalid format
Unable to load host key: /home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ecdsa_key.pub
Unable to load host key: /home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ed25519_key.pub
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring

On the client side I then get:

Password: Error: incorrect password
Please enter password manually:

Any suggestions? Would the recommendation to be to install/run the ClearML agent as root and/or using the system Python?

Steps to reproduce

On the agent:

# System: Ubuntu Focal 20.04, AMD64
# Install Miniconda, then
conda create -n clearml python=3.8
pip install clearml-agent
clearml-agent init
# Copy/paste credentials obtained from ClearML server
clearml-agent daemon --queue default --foreground

Then on the client:

clearml-session --public_ip true

# {
#     "base_task_id": null,
#    "git_credentials": false,
#    "jupyter_lab": true,
#    "password": "<long random-looking password>",
#    "public_ip": true,
#    "queue": "default",
#    "vscode_server": true
#}

norrishd avatar Jul 12 '21 05:07 norrishd

Hi @norrishd ,

Thanks for the details - I'll try to reproduce and update as soon as possible!

jkhenning avatar Jul 16 '21 08:07 jkhenning

Hi @norrishd The agent has no permissions to install the SSH server when running inside venv/conda. I'm not sure how we can support it without having root access for it. If an SSH daemon is already installed, it should be able to spin a second copy of it. wdyt?

bmartinn avatar Jul 19 '21 22:07 bmartinn

Thanks for the explanation @bmartinn! So do you mean that the current version is able to spin a second SSH daemon? (assuming there's an SSH daemon installed). If so that's very cool and would be fine (I must just be doing something wrong)

I tend to use venvs for everything just to avoid ever messing with the system Python. But I guess the use case for clearml-agent is that it's intended to run on servers (or in containers?) that are reserved for that purpose, so the recommendation is to use the system Python and install necessary packages there?

Would you also recommend running it as sudo, or does it not need that level of privileges?

norrishd avatar Jul 20 '21 12:07 norrishd

If so that's very cool and would be fine (I must just be doing something wrong)

Yes, at least in theory (if this doesn't work and /usr/sbin/sshd is still preinstalled, let me know what's the setup, it might be we are missing something)

... But I guess the use case for clearml-agent is that it's intended to run on servers

The agent itself can be installed on a venv (even though it might be easier to install system wide). The issue is the process the agent spins, I.e. when the agent gets a Job (a Task) it can either, create a new temporary venv for the Task install everything the Task needs there, spin the process and leave. Or it can spin a container for the Task, then repeat the same process (venv creation) inside the container. When the agent is used to spin the clearml-session usually the setup i s the agent is running in docker mode (i.e. with the flag --docker, then it spins all jobs inside a container, including the clearml-session's remote interactive session. Make sense?

bmartinn avatar Jul 21 '21 00:07 bmartinn

Yep makes total sense, thanks 😁

norrishd avatar Jul 21 '21 00:07 norrishd