aim
aim copied to clipboard
Securing Aim Remote Tracking server using SSL key and certificate
Securing Aim Remote Tracking server using SSL key and certificate
Hi, first of all I appreciate all the work you've put into making Aim!
I am having some trouble securing the connection to the Aim Remote Tracking (RT) Server, and was wondering if you could help me out.
I recently setup a virtual machine on Azure, which is running both the Aim RT Server and the Aim UI. To do this, I have used a docker-compose.yml, which brings up both the server and the UI. This is working properly, I can log runs from another machine and see them appear in the UI, great.
However, now I want to secure the connection to the remote tracking server using SSL, as described here. I've created a self-signed key and certificate file using openssl, as described here.
Whenever I bring up the server using this command, eveything seems in working order, I do not get any errors etc:
aim server --repo ~/mycontainer/aim/ --ssl-keyfile ~/secrets/server.key --ssl-certfile ~/secrets/server.crt --host 0.0.0.0 --dev --port 53800
But then when I try to log a run from another machine, I get the following error on the client:
azureuser@ml-ci-jvranken-prd:~/cloudfiles/code/Users/jvranken/aim-tracking-server$ python aim_test.py
Failed to connect to Aim Server. Have you forgot to run `aim server` command?
Traceback (most recent call last):
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 462, in _make_request
httplib_response = conn.getresponse()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 799, in urlopen
retries = retries.increment(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 462, in _make_request
httplib_response = conn.getresponse()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/utils.py", line 14, in wrapper
return func(*args, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/client.py", line 138, in connect
response = requests.get(endpoint, headers=self.request_headers)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/adapters.py", line 682, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/ml-ci-jvranken-prd/code/Users/jvranken/aim-tracking-server/aim_test.py", line 7, in <module>
run = Run(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 70, in wrapper
_SafeModeConfig.exception_callback(e, func)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 47, in reraise_exception
raise e
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 68, in wrapper
return func(*args, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/run.py", line 859, in __init__
super().__init__(run_hash, repo=repo, read_only=read_only, experiment=experiment, force_resume=force_resume)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/run.py", line 272, in __init__
super().__init__(run_hash, repo=repo, read_only=read_only, force_resume=force_resume)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/base_run.py", line 34, in __init__
self.repo = get_repo(repo)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo_utils.py", line 26, in get_repo
repo = Repo.from_path(repo)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo.py", line 210, in from_path
repo = Repo(path, read_only=read_only, init=init)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo.py", line 121, in __init__
self._client = Client(remote_path)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/client.py", line 50, in __init__
self.connect()
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/utils.py", line 18, in wrapper
raise RuntimeError(error_message)
RuntimeError: Failed to connect to Aim Server. Have you forgot to run `aim server` command?
Do you have any clue as to why this is not working? Here is the docker-compose.yaml and the python file I'm using:
services:
ui:
image: aimstack/aim:3.20.1
container_name: aim_ui
restart: unless-stopped
command: up --host 0.0.0.0 --port 43800 --dev
ports:
- 80:43800
volumes:
- ~/mycontainer/aim:/opt/aim
networks:
- aim
server:
image: aimstack/aim:3.20.1
container_name: aim_server
restart: unless-stopped
command: server --host 0.0.0.0 --dev --ssl-keyfile /opt/secrets/server.key --ssl-certfile /opt/secrets/server.crt
ports:
- 53800:53800
volumes:
- ~/mycontainer/aim:/opt/aim
- ~/secrets:/opt/secrets
networks:
- aim
networks:
aim:
driver: bridge
from aim import Run
# AIM_REPO='/home/azureuser/mycontainer/aim'
AIM_REPO='aim://REDACTED:53800'
AIM_EXPERIMENT='SSL-server'
run = Run(
repo=AIM_REPO,
experiment=AIM_EXPERIMENT
)
hparams_dict = {
'learning_rate': 0.001,
'batch_size': 32,
}
run['hparams'] = hparams_dict
# log metric
for i in range(30):
if i % 5 == 0:
i = i * 0.347
run.track(float(i), name='numbers')
@JeroenVranken thanks for the issue. This could be related to the auth token things we have added recently. @mihran113 @alberttorosyan what do you guys think?
Any update on this ? I guess I faced a similar issue in #3206
This error occurs with version 3.20.1, but everything works fine when I revert to the 3.17.4 version of AIM
This error occurs with version 3.20.1, but everything works fine when I revert to the 3.17.4 version of AIM
Have you tried the latest version 3.23.0. I seem to be dealing with the same issue.
@erikdao did you manage to resolve it? It seems in general Aim docs could be improved for making it into production. The docker-compose file is missing even in the repo, and there are a few docker / ssl related issues open for a long time..
@erikdao did you manage to resolve it? It seems in general Aim docs could be improved for making it into production. The docker-compose file is missing even in the repo, and there are a few docker / ssl related issues open for a long time..
It turned out that my problem was different. I didn't enable SSL when deploying Aim Server. My problem was related to networking on GCP.
Hey folks! Sorry for late response.
I've opened a PR which will add support for self-signed SSL certificates.
The problem here was that by default requests package doesn't trust self-signed certificates and needs a custom cert file path to verify against, which renders our protocol probe logic (choosing between http and https for the client) obsolete and falls back to using http which results in the errors shared above.
This addition will allow to specify cert files path via env variable, which will allow the client flow to work as expected.
The changes will be included in the upcoming 3.25.0 release.
@mihran113 amazing, thank you very much!!
Hey folks the changes for this have been published with the newest release of aim v3.25.0.
Please check out the new section in documentation for client side setup (works the same way as with versions of 3.17.5 and older)
https://aimstack.readthedocs.io/en/latest/using/remote_tracking.html#ssl-support
Let me know if everything works as expected or not 🙌