[ Setup/examples ] Initial Installation Issues - docker compose errors
Hello clearml team,
Congrats on the release of clearml-serving V2 🎉
I really wanted to check it out, and I'm having difficulties running the basic setup and scikit-learn example commands on my side.
I want to run the Installation and the Toy model (scikit learn) deployment example
I have a self-hosted clearml Server built with the helm chart on Kubernetes.
The environment variables of clearml-serving/docker/docker-compose.yml where defined in the myexemple.env file, and starts like this :
CLEARML_WEB_HOST="<http://localhost:8080/>"
CLEARML_API_HOST="<http://localhost:8008/>"
CLEARML_FILES_HOST="<http://localhost:8081/>"
Upon running docker-compose , both clearml-serving-inference and clearml-serving-statistics return errors:
Retrying (Retry(total=236, connect=236, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4065110310>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login
I think the issue comes from the communication with the Kafka service, but I do not know how to solve this. Has anyone encountered this issue and solved it before, since it's the default installation on the doc ?
Haven't found any related issues on any of the GitHub repos Thanks for the help 🤖
Hi @gaspard-met
I'll try to recreate your environment and help out. When you say you have a clearml server running on Kubernetes, how is that running locally? Microk8s? Also, do you use the experiment manager with that same server too? If so, can you share your ~/.clearml.conf file, because then we can see what URLs the experiment manager uses to connect to the server. I suspect there is a discrepancy there.
If not, when you use a web browser, can you actually go to the server via localhost?
Hello @thepycoder
Confronted with the same error with clearml-serving-inference here, using minikube in kubernetes.
To reproduce:
- Used
minikubeto create a single-node cluster;
$ minikube start --driver=docker \
--container-runtime=containerd \
--nodes=1
- Used
helmto create bothclearmlandclearml-serving;
(helm repo already added)
$ helm install clearml allegroai/clearml
$ helm install clearml-serving allegroai/clearml-serving
- Seems everything works fine:
$ kubectl get po
NAME READY STATUS RESTARTS AGE
alertmanager-84b874c6f8-nxnqm 1/1 Running 0 18h
clearml-apiserver-7b46876f44-gpm4v 1/1 Running 3 (8d ago) 8d
clearml-elastic-master-0 1/1 Running 0 8d
clearml-fileserver-5c968587b4-2zmqx 1/1 Running 0 8d
clearml-k8sagent-5d468b6d47-269qp 1/1 Running 0 5d19h
clearml-mongodb-6b94888687-r4x7d 1/1 Running 0 8d
clearml-redis-master-0 1/1 Running 1 (7d1h ago) 8d
clearml-serving-inference-85bcf97f69-w5b2b 1/1 Running 2 (171m ago) 18h
clearml-serving-statistics-6ffb8459bc-vhktv 1/1 Running 2 (171m ago) 18h
clearml-serving-triton-666f97b8d6-k8lsd 1/1 Running 2 (171m ago) 18h
clearml-webserver-7d86c649dd-txczl 1/1 Running 0 8d
grafana-84b7f5c559-wnfdx 1/1 Running 0 18h
kafka-cb849765-7kng5 1/1 Running 0 18h
prometheus-6f5868884b-9h5h8 1/1 Running 0 18h
zookeeper-6795454fbf-gqfjh 1/1 Running 0 18h
- Created a cred in
webserver/settings/Workspace(from a browser). Usedclearml-initto configure the ~/.clearml.conf file;
# ClearML SDK configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server: http://127.0.0.1:46555
web_server: http://127.0.0.1:38063
files_server: http://127.0.0.1:42347
# Credentials are generated using the webapp, http://127.0.0.1:45595/settings
# Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
credentials {"access_key":
......
which I am sure is correct cuz webserver works fine. And Git examples in /clearml/examples/ works.
- I tried Git examples in
/clearml-serving/examples/pytorch/, and created an endpoint;
$ clearml-serving model list
clearml-serving - CLI for launching ClearML serving engine
Notice! serving service ID not provided, selecting the first active service
List model serving and endpoints, control task id=1f38787f5f7a4ab6b860532369f0aa57
Info: syncing model endpoint configuration, state hash=253e8350252883f7e599572903a5cf63
Endpoints:
{
"test_pytorch_mnist/1": {
"engine_type": "triton",
"serving_url": "test_pytorch_mnist",
"model_id": "3ed0f8563b56482eb9726230f1171ef1",
"version": "1",
"preprocess_artifact": "py_code_test_pytorch_mnist_1",
"input_size": [
1,
28,
28
],
"input_type": "float32",
"input_name": "INPUT__0",
"output_size": [
-1,
10
],
"output_type": "float32",
"output_name": "OUTPUT__0",
"auxiliary_cfg": null
}
}
Model Monitoring:
{}
Canary:
{}
- Then I found the serving URL cannot be reached. Checked the logs of pod
clearml-serving-inference;
CLEARML_SERVING_TASK_ID=ClearML Serving Task ID
CLEARML_SERVING_PORT=8080
CLEARML_USE_GUNICORN=true
CLEARML_EXTRA_PYTHON_PACKAGES=
CLEARML_SERVING_NUM_PROCESS=2
CLEARML_SERVING_POLL_FREQ=1.0
CLEARML_DEFAULT_KAFKA_SERVE_URL=clearml-serving-kafka:9092
WEB_CONCURRENCY=
SERVING_PORT=8080
GUNICORN_NUM_PROCESS=2
GUNICORN_SERVING_TIMEOUT=
GUNICORN_MAX_REQUESTS=0
GUNICORN_EXTRA_ARGS=
UVICORN_SERVE_LOOP=asyncio
UVICORN_EXTRA_ARGS=
UVICORN_LOG_LEVEL=warning
CLEARML_DEFAULT_BASE_SERVE_URL=http://127.0.0.1:8080/serve
CLEARML_DEFAULT_TRITON_GRPC_ADDR=clearml-serving-triton:8001
Starting Gunicorn server
Retrying (Retry(total=239, connect=239, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9b57d87610>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /auth.login
Retrying (Retry(total=238, connect=238, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9aefd01760>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /auth.login
Retrying (Retry(total=237, connect=237, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9aefc207f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /auth.login
......
- Checked the
endpoints.shin theclearml-serving-inferencepod withkubectl exec;
......
else
echo "Starting Gunicorn server"
# start service
PYTHONPATH=$(pwd) python3 -m gunicorn \
--preload clearml_serving.serving.main:app \
--workers $GUNICORN_NUM_PROCESS \
--worker-class uvicorn.workers.UvicornWorker \
--max-requests $GUNICORN_MAX_REQUESTS \
--timeout $GUNICORN_SERVING_TIMEOUT \
--bind 0.0.0.0:$SERVING_PORT \
$GUNICORN_EXTRA_ARGS
fi
Seems this gunicorn app failed to communicate with something
Thanks for any help! :)
Thanks for the detailed writeup @Muscle-Oliver !
So I've taken a look and it seems like a specific parameter is missing from the helm chart.
The url http://127.0.0.1:8080/serve does not look like the correct url to connect to, most of the time in a kubernetes cluster you'd use different IP addresses than localhost.
In order to set this IP address, you'll have to edit the following parameter in the serving docker-compose yaml file: https://github.com/allegroai/clearml-serving/blob/main/docker/docker-compose-triton-gpu.yml#L92
But it seems that the particular env var CLEARML_DEFAULT_BASE_SERVE_URL isn't exposed in the helm chart at all! So I'm adding @valeriano-manassero to the discussion as he is the maintainer of the helm charts. Using this parameter should allow you to set everything up properly :)
Thanks for the quick reply @thepycoder !
May I ask what problem it suggests by /auth.login at the end of the starting log of gunicorn?
As it goes:
Retrying (Retry(total=237, connect=237, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9aefc207f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /auth.login
And, I'm just wondering whether this env just indicating the final serving url after gunicorn startup successfully, rather than some destination it now trying to connect? :smile:
Thanks for any further update :) :coffee:
Hi, I just issued a PR mentioning this issue, can you pls check it and letting me know if this is what you are expecting?
Since that change is not breaking, I just merged PR and released clearml-serving-0.4.0 . pls let me know if this chart is good for you.
Thanks for the update @valeriano-manassero !
So, I also tried minikube start driver=none with root. And helm installed clearml-serving-0.4.0 as suggested. But everything worked out the same.
The log of pod clearml-serving-inference still goes:
Retrying (Retry(total=234, connect=234, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff3de505880>: Failed to establish a new connection: [Errno -2] Name or service not known')': /auth.login
which means gunicorn failed to start ? (or it did, but not working as expected)
Then I kubectl exec into the pod clearml-serving-inference, and manually executed the clearml_serving/serving/entrypoint.sh
And I used Ctrl+C to terminate the process as the above error log appeared, which produced a more detailed output:
root@clearml-serving-inference-85bcf97f69-9jsdh:~/clearml# sh clearml_serving/serving/entrypoint.sh
CLEARML_SERVING_TASK_ID=ClearML Serving Task ID
CLEARML_SERVING_PORT=8080
CLEARML_USE_GUNICORN=true
EXTRA_PYTHON_PACKAGES=
CLEARML_SERVING_NUM_PROCESS=2
CLEARML_SERVING_POLL_FREQ=1.0
CLEARML_DEFAULT_KAFKA_SERVE_URL=clearml-serving-kafka:9092
CLEARML_DEFAULT_KAFKA_SERVE_URL=clearml-serving-kafka:9092
WEB_CONCURRENCY=
SERVING_PORT=8080
GUNICORN_NUM_PROCESS=2
GUNICORN_SERVING_TIMEOUT=
GUNICORN_EXTRA_ARGS=
UVICORN_SERVE_LOOP=asyncio
UVICORN_EXTRA_ARGS=
CLEARML_DEFAULT_BASE_SERVE_URL=http://127.0.0.1:8080/serve
CLEARML_DEFAULT_TRITON_GRPC_ADDR=clearml-serving-triton:8001
Starting Gunicorn server
Retrying (Retry(total=239, connect=239, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5bfe7e4610>: Failed to establish a new connection: [Errno -2] Name or service not known')': /auth.login
Retrying (Retry(total=238, connect=238, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b984f2520>: Failed to establish a new connection: [Errno -2] Name or service not known')': /auth.login
^CTraceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/local/lib/python3.9/socket.py", line 954, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/local/lib/python3.9/http/client.py", line 1285, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.9/http/client.py", line 1331, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.9/http/client.py", line 1280, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.9/http/client.py", line 1040, in _send_output
self.send(msg)
File "/usr/local/lib/python3.9/http/client.py", line 980, in send
self.connect()
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 205, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f5b984f27f0>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.9/site-packages/gunicorn/__main__.py", line 7, in <module>
run()
File "/usr/local/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/usr/local/lib/python3.9/site-packages/gunicorn/app/base.py", line 231, in run
super().run()
File "/usr/local/lib/python3.9/site-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/usr/local/lib/python3.9/site-packages/gunicorn/arbiter.py", line 58, in __init__
self.setup(app)
File "/usr/local/lib/python3.9/site-packages/gunicorn/arbiter.py", line 118, in setup
self.app.wsgi()
File "/usr/local/lib/python3.9/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/local/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/usr/local/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/local/lib/python3.9/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/root/clearml/clearml_serving/serving/main.py", line 49, in <module>
serving_task = ModelRequestProcessor._get_control_plane_task(task_id=serving_service_task_id)
File "/root/clearml/clearml_serving/serving/model_request_processor.py", line 1094, in _get_control_plane_task
task = Task.get_task(task_id=task_id)
File "/usr/local/lib/python3.9/site-packages/clearml/task.py", line 796, in get_task
return cls.__get_task(
File "/usr/local/lib/python3.9/site-packages/clearml/task.py", line 3523, in __get_task
return cls(private=cls.__create_protection, task_id=task_id, log_to_backend=False)
File "/usr/local/lib/python3.9/site-packages/clearml/task.py", line 169, in __init__
super(Task, self).__init__(**kwargs)
File "/usr/local/lib/python3.9/site-packages/clearml/backend_interface/task/task.py", line 152, in __init__
super(Task, self).__init__(id=task_id, session=session, log=log)
File "/usr/local/lib/python3.9/site-packages/clearml/backend_interface/base.py", line 145, in __init__
super(IdObjectBase, self).__init__(session, log, **kwargs)
File "/usr/local/lib/python3.9/site-packages/clearml/backend_interface/base.py", line 39, in __init__
self._session = session or self._get_default_session()
File "/usr/local/lib/python3.9/site-packages/clearml/backend_interface/base.py", line 115, in _get_default_session
InterfaceBase._default_session = Session(
File "/usr/local/lib/python3.9/site-packages/clearml/backend_api/session/session.py", line 207, in __init__
self.refresh_token()
File "/usr/local/lib/python3.9/site-packages/clearml/backend_api/session/token_manager.py", line 112, in refresh_token
self._set_token(self._do_refresh_token(self.__token, exp=self.req_token_expiration_sec))
File "/usr/local/lib/python3.9/site-packages/clearml/backend_api/session/session.py", line 736, in _do_refresh_token
res = self._send_request(
File "/usr/local/lib/python3.9/site-packages/clearml/backend_api/session/session.py", line 358, in _send_request
res = self.__http_session.request(
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.9/site-packages/clearml/backend_api/utils.py", line 85, in send
return super(SessionWithTimeout, self).send(request, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 788, in urlopen
retries.sleep()
File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 432, in sleep
self._sleep_backoff()
File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 416, in _sleep_backoff
time.sleep(backoff)
KeyboardInterrupt
Will this help to explain the connection error? Or does Task.init() have something to do with the reported ... Connection refused')': /auth.login? I'm not sure.
Thank you for any reply!
Just a question: I see clearml.defaultBaseServeUrl in chart is still the default defaultBaseServeUrl . Did you try to change that url to point to the right endpoint?
Thanks for the helm update. But, I'm confused now :joy:
How can I get this clearml-serving-inference to start correctly, anyway?
I can infer from the logs that the startup problem may results from some sort of connection error. But I have no idea where exactly the gunicorn was connecting to.
As @thepycoder have suggested, the CLEARML_DEFAULT_BASE_SERVE_URL might not be localhost, but some cluster IP. Then I checked all the services in the cluster, and only found one service with port 8080, which is clearml-serving-inference itself!
Does this mean the localhost is actually correct :question:
I really have no idea what is the right endpoint of the clearml.defaultBaseServeUrl.
As the clearml-serving-inference gunicorn entrypoint.sh goes:
#!/bin/bash
# print configuration
echo CLEARML_SERVING_TASK_ID="$CLEARML_SERVING_TASK_ID"
echo CLEARML_SERVING_PORT="$CLEARML_SERVING_PORT"
echo CLEARML_USE_GUNICORN="$CLEARML_USE_GUNICORN"
echo EXTRA_PYTHON_PACKAGES="$EXTRA_PYTHON_PACKAGES"
echo CLEARML_SERVING_NUM_PROCESS="$CLEARML_SERVING_NUM_PROCESS"
echo CLEARML_SERVING_POLL_FREQ="$CLEARML_SERVING_POLL_FREQ"
echo CLEARML_DEFAULT_KAFKA_SERVE_URL="$CLEARML_DEFAULT_KAFKA_SERVE_URL"
echo CLEARML_DEFAULT_KAFKA_SERVE_URL="$CLEARML_DEFAULT_KAFKA_SERVE_URL"
SERVING_PORT="${CLEARML_SERVING_PORT:-8080}"
GUNICORN_NUM_PROCESS="${CLEARML_SERVING_NUM_PROCESS:-4}"
GUNICORN_SERVING_TIMEOUT="${GUNICORN_SERVING_TIMEOUT:-600}"
UVICORN_SERVE_LOOP="${UVICORN_SERVE_LOOP:-asyncio}"
# set default internal serve endpoint (for request pipelining)
CLEARML_DEFAULT_BASE_SERVE_URL="${CLEARML_DEFAULT_BASE_SERVE_URL:-http://127.0.0.1:$SERVING_PORT/serve}"
CLEARML_DEFAULT_TRITON_GRPC_ADDR="${CLEARML_DEFAULT_TRITON_GRPC_ADDR:-127.0.0.1:8001}"
# print configuration
echo WEB_CONCURRENCY="$WEB_CONCURRENCY"
echo SERVING_PORT="$SERVING_PORT"
echo GUNICORN_NUM_PROCESS="$GUNICORN_NUM_PROCESS"
echo GUNICORN_SERVING_TIMEOUT="$GUNICORN_SERVING_PORT"
echo GUNICORN_EXTRA_ARGS="$GUNICORN_EXTRA_ARGS"
echo UVICORN_SERVE_LOOP="$UVICORN_SERVE_LOOP"
echo UVICORN_EXTRA_ARGS="$UVICORN_EXTRA_ARGS"
echo CLEARML_DEFAULT_BASE_SERVE_URL="$CLEARML_DEFAULT_BASE_SERVE_URL"
echo CLEARML_DEFAULT_TRITON_GRPC_ADDR="$CLEARML_DEFAULT_TRITON_GRPC_ADDR"
# runtime add extra python packages
if [ ! -z "$EXTRA_PYTHON_PACKAGES" ]
then
python3 -m pip install $EXTRA_PYTHON_PACKAGES
fi
if [ -z "$CLEARML_USE_GUNICORN" ]
then
echo "Starting Uvicorn server"
PYTHONPATH=$(pwd) python3 -m uvicorn \
clearml_serving.serving.main:app --host 0.0.0.0 --port $SERVING_PORT --loop $UVICORN_SERVE_LOOP \
$UVICORN_EXTRA_ARGS
else
echo "Starting Gunicorn server"
# start service
PYTHONPATH=$(pwd) python3 -m gunicorn \
--preload clearml_serving.serving.main:app \
--workers $GUNICORN_NUM_PROCESS \
--worker-class uvicorn.workers.UvicornWorker \
--timeout $GUNICORN_SERVING_TIMEOUT \
--bind 0.0.0.0:$SERVING_PORT \
$GUNICORN_EXTRA_ARGS
fi
Maybe we can set this clear by reproducing the startup process of the gunicorn in the clearml-serving-inference?
I probably found the issue:
there should be some misconfiguration on apiHost value in your helm chart installation.
If they are on the same cluster they should be:
apiHost: http://clearml-enterprise-apiserver:8008
filesHost: http://clearml-enterprise-fileserver:8081
webHost: http://clearml-enterprise-webserver:80
Once you will not get anymore connection errors, you can connect to the inference service simply doing a port-forward with kubectl -n clearml port-forward svc/clearml-serving-inference 8080:8080.
Let me know if this helps.
@valeriano-manassero Thanks! That's it!
Thanks to your reminder, I finally noticed that the configs of clearml-serving are all incorrect. :rofl:
Previously I installed clearml-serving via helm install [RELEASE] [CHART] command line. And All the helm charts used the default configs from values.yaml, left unchecked.
I git pull the clearml-serving repo and checked the values.yaml, which goes:
clearml:
apiAccessKey: "ClearML API Access Key"
apiSecretKey: "ClearML API Secret Key"
apiHost: http://clearml-server-apiserver:8008
filesHost: http://clearml-server-fileserver:8081
webHost: http://clearml-server-webserver:80
servingTaskId: "ClearML Serving Task ID"
......
where all the Host addresses don't match with my current services of clearml (I installed clearml via helm, app version 1.4.0).
The correct services should be (version 1.4.0):
clearml-apiserver:8008
clearml-fileserver:8081
clearml-webserver:80
No more connection error!
Hello, i have the same issue with connection errors.
On my Ubuntu machine:
- I successfully installed clearml server and it sits on my localhost:8080
On the same Ubuntu machine i tried:
- to run docker-compose from this tutorial - I wasn't able to deploy serving-inference and serving-statistics parts of this compose - they can't connect and throw /auth.login
- to run inference container from toy mode tutoriall - i wasn't able to deploy it, because it can't connect and throw me /auth.login
my example.env file:
CLEARML_WEB_HOST="http://localhost:80"
CLEARML_API_HOST="http://localhost:8008"
CLEARML_FILES_HOST="http://localhost:8081"
CLEARML_API_ACCESS_KEY="IEHHDEZ3HO2MNHYX5OAZ"
CLEARML_API_SECRET_KEY="IbIAqWWAjmWcNxk6uOlFqywuBIT350Dy03II77SE2wOaiAhl8T"
CLEARML_SERVING_TASK_ID="7b94d19189b84692b1450b00037dc45d"
my conf file:
# ClearML SDK configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server: http://localhost:8008
web_server: http://localhost:8080
files_server: http://localhost:8081
# Credentials are generated using the webapp, http://localhost:8080/settings
# Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
credentials {"access_key": "IEHHDEZ3HO2MNHYX5OAZ", "secret_key": "IbIAqWWAjmWcNxk6uOlFqywuBIT350Dy03II77SE2wOaiAhl8T"}
}
Hi, @Mithmi i host my main clearml server and clearml serving server by utilizing different docker-compose files. I have resolved this issue by hosting main and serving composes under the same network. I hope this stackoverflow will help you to figure it out for your case.