aim
aim copied to clipboard
The docker container doesn't work out of the box
🐛 Bug
Hello, First of all, thank you very much for the creazy amount of work put into this.
Currently, it seems like the docker setup fails. I get this message :
Aborted! '/opt/aim' is not a valid Aim repository. Do you want to initialize it? [y/N]:
To reproduce
docker run -d -p 43800:43800 -v /testFolder:/opt/aim aimstack/aim
Expected behavior
The container running and listening on port 43800
Environment
- Aim Version (e.g., 3.0.1) : None
- Python version : None
- pip version : None
- OS (e.g., Linux) : Ubuntu 22.04
- Any other relevant information Latest Docker image
Additional context
Is there a way to add a yes to everything paramter (something like apt-get install -y)?
Thank you very much in advance.
hey @ilisparrow,
/testFolder is the path of the parent directory where .aim repo is located, could you please double check if the given path is correct?
For example, if you have:
/home
|-> /projects
|---> /ner-parsing
|-----> /.aim
the volume should be specified as -v /home/projects/ner-parsing:/opt/aim
Hello, Thank you for the fast reply. I see what the problem is (I will test it as soon as I have access to a computer). There needs to be an initialised aim on the host, so in my case :
- Go to /testFolder
- aim unit
- Start the docker pointing at /testFolder
But what if I can't install aim in the host? How can I init it without?
Thanks again!
@ilisparrow aimstackio/aim image runs Aim UI on the specified repo, it basically executes aim up command. This is useful for running Aim UI on k8s clusters for example.
It is assumed that the repo exists and a separate training process (outside of the docker image) writes training metadata to the repo.
Could you please share what does your setup look like? Do you run trainings inside docker images?
Hello, I had a deeper look. So I wanted to setup an aimstack instance on a machine then have other machines send data to it. If my understunding is good, there needs to be a server receiveing the data, writing it to the right folder. Then a docker ui displaying this data.
After creating a folder then doing a : aim init then running the docker containers with a volume linked to the initialised folder. It now runs without any error. But for some reason, I can't get anything display on web browser. I get the connexion was reinnitialised.
It seems like the Docker part is not yet stable, I will limit my self to a locally hosted server and ui even if it's not the best solution.
To answer your question, I wanted to track all experiments on a server and have training on a different server. I run the training in a normal python script not in a docker.
Thank you still for your answers and work !
Best regards, Ilias.
I'm having the same problem.
I have a similar problem even the simple docker command with no mounts as specified in the docs isn't working.
docker run --publish 43800:43800 aimstack/aim
I mean the command runs without error:
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
--------------------------------------------------------------------------
Aim UI collects anonymous usage analytics.
Read how to opt-out here:
https://aimstack.readthedocs.io/en/latest/community/telemetry.html
--------------------------------------------------------------------------
Running Aim UI on repo `<Repo#1073330378075837856 path=/opt/aim/.aim read_only=None>`
Open http://127.0.0.1:43800
Press Ctrl+C to exit
but when I go to localhost:43800 I see page not found. the dockerfile looks simple enough so I don't know why it isn't working.
I'm on mac m1 I though that might be the problem, so I tried building image for linux/arm64/v8 with command:
docker build . --platform linux/arm64/v8 --build-arg AIM_VERSION=3.14.4 -t my_aim
but that fails because aimrocks==0.2.1 can't be pip installed.
Also for more context my usecase is similar to @ilisparrow so far. I have made a docker-compose file like below: (I'm using a bind volume because I was just testing in local but could also be a named volume(docker managed))
version: "3"
services:
ui:
image: aimstack/aim:3.14.4
# platform: linux/amd64 # linux/arm64/v8
container_name: aim_ui
restart: unless-stopped
ports:
- 43800:43800
volumes:
- ~/workspace/outputs/aim:/.aim
networks:
- aim
server:
image: aimstack/aim:3.14.4
container_name: aim_server
restart: unless-stopped
command: server
ports:
- 53800:53800
volumes:
- ~/workspace/outputs/aim:/.aim
networks:
- aim
networks:
aim:
driver: bridge
@cceyda Hi, I faced the same issue: docker run --publish 43800:43800 aimstack/aim works but localhost:43800 isn't available.
For me helps to change the host to 0.0.0.0 because it is public unlike 127.0.0.1.
Try docker run --publish 43800:43800 aimstack/aim --host 0.0.0.0
Building on top of @cceyda solution:
Adding the key-value command: up --host 0.0.0.0 will make the UI work in a docker-container
However, I'm clueless on how to actually post runs to the aim-server.
Nothing seems to work. They use their own prefix (like aim://10.10.10.10:<port>) but this only confuses aim-loggers during python run: It creates folders starting with aim: in your current directory instead of trying to connect to the server...
version: "3"
services:
ui:
image: aimstack/aim:3.14.4
command: up --host 0.0.0.0
container_name: aim_ui
restart: unless-stopped
ports:
- 43800:43800
volumes:
- ~/workspace/outputs/aim:/.aim
networks:
- aim
server:
image: aimstack/aim:3.14.4
container_name: aim_server
restart: unless-stopped
command: server
ports:
- 53800:53800
volumes:
- ~/workspace/outputs/aim:/.aim
networks:
- aim
networks:
aim:
driver: bridge
@vanhumbeecka
To connect to the server, you should use your host. If you're running docker on a local machine then any of localhost/127.0.0.1/0.0.0.0 and port 53800
E.g. run = aim.Run(repo='aim://127.0.0.1:53800') (at least that works for me)
About the directories there are 2 points:
- Does
~/workspace/outputs/aimexist on your machine? If not you have to create it (or any other, it doesn't matter) - I'm unsure if
/.aimis the correct path inside a container. I use/opt/aimand this works So, in your case, it probably should be
volumes:
- ~/workspace/outputs/aim:/opt/aim
Hi @feldlime
As I'm running these containers on a different machine in my LAN, I'm connecting to it using it's IP address. E.g. aim://192.168.110:53800.
1. Does ~/workspace/outputs/aim exist on your machine? If not you have to create it (or any other, it doesn't matter)
2. I'm unsure if /.aim is the correct path inside a container. I use /opt/aim and this works
So I have a different storage location, but I just copy-pasted your code in github only, hence the confusion. Below is the complete setup I now have (with corrected volume paths as I'm using them currently)
Now the containers won't even start properly. When setting the path inside the container to /opt/aim, I'm getting the following error
Aborted!
'/opt/aim' is not a valid Aim repository. Do you want to initialize it? [y/N]:
This same error appears when with and without adding the --repo=/opt/aim option to both commands
version: "3"
services:
ui:
image: aimstack/aim:latest
container_name: aim_ui
restart: unless-stopped
command: up --host=0.0.0.0 --repo=/opt/aim
volumes:
- /volume1/docker/aim:/opt/aim
ports:
- 43800:43800
networks:
- aim
server:
image: aimstack/aim:latest
container_name: aim_server
restart: unless-stopped
command: server --repo=/opt/aim
ports:
- 53800:53800
volumes:
- /volume1/docker/aim:/opt/aim
networks:
- aim
networks:
aim:
driver: bridge
EDIT
After manually adding an .aim directory inside my /volume1/docker/aim directory, the error message changed to:
'/opt/aim' requires upgrade. Do you want to run upgrade automatically? [y/N]:
I'm guessing I need to find a way to call aim init inside the docker container? I would assume the aim-docker image would take care of this somehow?
I've updated my docker-compose.yml file to the one below.
The .aim repo is now initiatiazed through docker, but still no luck getting it working as I'm now getting cryptic python errors in the aim-ui
docker-compose file:
version: "3"
services:
init:
image: aimstack/aim:latest
container_name: aim_init
restart: "no"
command: init
volumes:
- /volume1/docker/aim:/opt/aim
networks:
- aim
ui:
image: aimstack/aim:latest
container_name: aim_ui
restart: on-failure
command: up --host=0.0.0.0
volumes:
- /volume1/docker/aim:/opt/aim
ports:
- 43800:43800
networks:
- aim
depends_on:
- init
server:
image: aimstack/aim:latest
container_name: aim_server
restart: on-failure
command: server
ports:
- 53800:53800
volumes:
- /volume1/docker/aim:/opt/aim
networks:
- aim
depends_on:
- init
errors in aim-ui logs
**ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 84, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 443, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/gzip.py", line 24, in __call__
await responder(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/gzip.py", line 44, in __call__
await self.app(scope, receive, self.send_with_gzip)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 69, in app
await response(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 273, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
raise exceptions[0]
File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 269, in wrap
await func()
File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 258, in stream_response
async for chunk in self.body_iterator:
File "/usr/local/lib/python3.9/site-packages/aim/web/api/runs/utils.py", line 259, in run_search_result_streamer
run_dict[run.hash]['traces'] = run.collect_sequence_info(sequence_types='metric')
File "/usr/local/lib/python3.9/site-packages/aim/sdk/run.py", line 661, in collect_sequence_info
ctx_dict = self.idx_to_ctx(idx).to_dict()
File "/usr/local/lib/python3.9/site-packages/aim/sdk/run.py", line 335, in idx_to_ctx
return self._tracker.idx_to_ctx(idx)
File "/usr/local/lib/python3.9/site-packages/aim/sdk/tracker.py", line 80, in idx_to_ctx
ctx = Context(self.meta_tree['contexts', idx])
File "aim/storage/treeview.py", line 51, in aim.storage.treeview.TreeView.__getitem__
File "aim/storage/containertreeview.py", line 73, in aim.storage.containertreeview.ContainerTreeView.collect
KeyError: "No key ('contexts', -4504863053055089774) is present."
@vanhumbeecka Right steps, you need to init aim directory.
I can guess that the problem is that it's not inited still when aim_ui and aim_server started.
Because with the depends_on instruction docker-compose is waiting only for the moment when container is run. But aim also requires a couple of seconds to init directory.
My suggestion, you can manually add some sleep instructions to ui and server containers, e.g.
entrypoint: >
/bin/sh -c "
/bin/sleep 5;
aim server;
"
Thanks for the catch @feldlime adding --host 0.0.0.0 was the key!
I have tested and the below docker-compose has worked for me both on local & on a remote machine.
@vanhumbeecka You have to add command: server --host 0.0.0.0 to the server aswell.
and make sure the directory you are mounting in the volumes exists. It is ~/aim/training_logs for the example below.
Also check the versions match between your client(python that is logging runs) and the server(docker image).
version: "3"
services:
ui:
image: aimstack/aim:3.16.0
container_name: aim_ui
restart: unless-stopped
command: up --host 0.0.0.0
ports:
- 43800:43800
volumes:
- ~/aim/training_logs:/opt/aim
networks:
- aim
server:
image: aimstack/aim:3.16.0
container_name: aim_server
restart: unless-stopped
command: server --host 0.0.0.0
ports:
- 53800:53800
volumes:
- ~/aim/training_logs:/opt/aim
networks:
- aim
networks:
aim:
driver: bridge
Here is a fake run to test remote connection:
from aim import Run
aim_run = Run(repo='aim://[remote_ip]:53800',
experiment="docker_remote_test") # replace example IP with your tracking server IP/hostname
# Log run parameters
aim_run['params'] = {
'learning_rate': 0.001,
'batch_size': 32,
}
aim_run.track(5, name='loss', epoch=0,
context={'subset':'train'})
aim_run.track(4, name='loss', epoch=1,
context={'subset':'train'})
aim_run.track(3, name='loss', epoch=2,
context={'subset':'train'})
aim_run.track(2, name='loss', epoch=3,
context={'subset':'train'})
@cceyda I haven't added the --host 0.0.0.0 to the server explicitly, because it's the default host when running with server, so that part is fine.
@feldlime I actually got it working by splitting the 2 out (init step from server and ui). That way I don't need to deal with timeouts and just do it manually. After initializing through docker with init, and later starting the compose file with server and ui, everything works and I don't see those weird python errors anymore. So it seems initializing first (and not at the same time as booting up server and ui) was the trick - otherwise the data could become corrupted.
Thanks for all the feedback and help!