aim icon indicating copy to clipboard operation
aim copied to clipboard

The docker container doesn't work out of the box

Open ilisparrow opened this issue 3 years ago • 19 comments
trafficstars

🐛 Bug

Hello, First of all, thank you very much for the creazy amount of work put into this.

Currently, it seems like the docker setup fails. I get this message : Aborted! '/opt/aim' is not a valid Aim repository. Do you want to initialize it? [y/N]:

To reproduce

docker run -d -p 43800:43800 -v /testFolder:/opt/aim aimstack/aim

Expected behavior

The container running and listening on port 43800

Environment

  • Aim Version (e.g., 3.0.1) : None
  • Python version : None
  • pip version : None
  • OS (e.g., Linux) : Ubuntu 22.04
  • Any other relevant information Latest Docker image

Additional context

Is there a way to add a yes to everything paramter (something like apt-get install -y)?

Thank you very much in advance.

ilisparrow avatar Jul 13 '22 16:07 ilisparrow

hey @ilisparrow, /testFolder is the path of the parent directory where .aim repo is located, could you please double check if the given path is correct?

gorarakelyan avatar Jul 14 '22 08:07 gorarakelyan

For example, if you have:

/home
|-> /projects
|---> /ner-parsing
|-----> /.aim

the volume should be specified as -v /home/projects/ner-parsing:/opt/aim

gorarakelyan avatar Jul 14 '22 08:07 gorarakelyan

Hello, Thank you for the fast reply. I see what the problem is (I will test it as soon as I have access to a computer). There needs to be an initialised aim on the host, so in my case :

  • Go to /testFolder
  • aim unit
  • Start the docker pointing at /testFolder

But what if I can't install aim in the host? How can I init it without?

Thanks again!

ilisparrow avatar Jul 14 '22 09:07 ilisparrow

@ilisparrow aimstackio/aim image runs Aim UI on the specified repo, it basically executes aim up command. This is useful for running Aim UI on k8s clusters for example. It is assumed that the repo exists and a separate training process (outside of the docker image) writes training metadata to the repo.

Could you please share what does your setup look like? Do you run trainings inside docker images?

gorarakelyan avatar Jul 14 '22 18:07 gorarakelyan

Hello, I had a deeper look. So I wanted to setup an aimstack instance on a machine then have other machines send data to it. If my understunding is good, there needs to be a server receiveing the data, writing it to the right folder. Then a docker ui displaying this data.

After creating a folder then doing a : aim init then running the docker containers with a volume linked to the initialised folder. It now runs without any error. But for some reason, I can't get anything display on web browser. I get the connexion was reinnitialised.

It seems like the Docker part is not yet stable, I will limit my self to a locally hosted server and ui even if it's not the best solution.

To answer your question, I wanted to track all experiments on a server and have training on a different server. I run the training in a normal python script not in a docker.

Thank you still for your answers and work !

Best regards, Ilias.

ilisparrow avatar Jul 18 '22 14:07 ilisparrow

I'm having the same problem.

hkang-vuno avatar Nov 07 '22 10:11 hkang-vuno

I have a similar problem even the simple docker command with no mounts as specified in the docs isn't working. docker run --publish 43800:43800 aimstack/aim I mean the command runs without error:

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
--------------------------------------------------------------------------
                Aim UI collects anonymous usage analytics.                
                        Read how to opt-out here:                         
    https://aimstack.readthedocs.io/en/latest/community/telemetry.html    
--------------------------------------------------------------------------
Running Aim UI on repo `<Repo#1073330378075837856 path=/opt/aim/.aim read_only=None>`
Open http://127.0.0.1:43800
Press Ctrl+C to exit

but when I go to localhost:43800 I see page not found. the dockerfile looks simple enough so I don't know why it isn't working.

I'm on mac m1 I though that might be the problem, so I tried building image for linux/arm64/v8 with command: docker build . --platform linux/arm64/v8 --build-arg AIM_VERSION=3.14.4 -t my_aim but that fails because aimrocks==0.2.1 can't be pip installed.

cceyda avatar Nov 23 '22 06:11 cceyda

Also for more context my usecase is similar to @ilisparrow so far. I have made a docker-compose file like below: (I'm using a bind volume because I was just testing in local but could also be a named volume(docker managed))

version: "3"
services:
  ui:
    image: aimstack/aim:3.14.4
    # platform: linux/amd64 # linux/arm64/v8
    container_name: aim_ui
    restart: unless-stopped
    ports:
      - 43800:43800
    volumes:
    - ~/workspace/outputs/aim:/.aim
    networks:
      - aim

  server:
    image: aimstack/aim:3.14.4
    container_name: aim_server
    restart: unless-stopped
    command: server
    ports:
      - 53800:53800
    volumes:
    - ~/workspace/outputs/aim:/.aim
    networks:
      - aim

networks:
  aim:
    driver: bridge

cceyda avatar Nov 23 '22 06:11 cceyda

@cceyda Hi, I faced the same issue: docker run --publish 43800:43800 aimstack/aim works but localhost:43800 isn't available. For me helps to change the host to 0.0.0.0 because it is public unlike 127.0.0.1.

Try docker run --publish 43800:43800 aimstack/aim --host 0.0.0.0

feldlime avatar Feb 16 '23 05:02 feldlime

Building on top of @cceyda solution: Adding the key-value command: up --host 0.0.0.0 will make the UI work in a docker-container However, I'm clueless on how to actually post runs to the aim-server. Nothing seems to work. They use their own prefix (like aim://10.10.10.10:<port>) but this only confuses aim-loggers during python run: It creates folders starting with aim: in your current directory instead of trying to connect to the server...

version: "3"
services:
  ui:
    image: aimstack/aim:3.14.4
    command: up --host 0.0.0.0
    container_name: aim_ui
    restart: unless-stopped
    ports:
      - 43800:43800
    volumes:
    - ~/workspace/outputs/aim:/.aim
    networks:
      - aim

  server:
    image: aimstack/aim:3.14.4
    container_name: aim_server
    restart: unless-stopped
    command: server
    ports:
      - 53800:53800
    volumes:
    - ~/workspace/outputs/aim:/.aim
    networks:
      - aim

networks:
  aim:
    driver: bridge

vanhumbeecka avatar Feb 17 '23 13:02 vanhumbeecka

@vanhumbeecka To connect to the server, you should use your host. If you're running docker on a local machine then any of localhost/127.0.0.1/0.0.0.0 and port 53800 E.g. run = aim.Run(repo='aim://127.0.0.1:53800') (at least that works for me)

About the directories there are 2 points:

  1. Does ~/workspace/outputs/aim exist on your machine? If not you have to create it (or any other, it doesn't matter)
  2. I'm unsure if /.aim is the correct path inside a container. I use /opt/aim and this works So, in your case, it probably should be
volumes:
    - ~/workspace/outputs/aim:/opt/aim

feldlime avatar Feb 18 '23 14:02 feldlime

Hi @feldlime

As I'm running these containers on a different machine in my LAN, I'm connecting to it using it's IP address. E.g. aim://192.168.110:53800.

1. Does ~/workspace/outputs/aim exist on your machine? If not you have to create it (or any other, it doesn't matter)
2. I'm unsure if /.aim is the correct path inside a container. I use /opt/aim and this works

So I have a different storage location, but I just copy-pasted your code in github only, hence the confusion. Below is the complete setup I now have (with corrected volume paths as I'm using them currently)

Now the containers won't even start properly. When setting the path inside the container to /opt/aim, I'm getting the following error

Aborted!

'/opt/aim' is not a valid Aim repository. Do you want to initialize it? [y/N]: 

This same error appears when with and without adding the --repo=/opt/aim option to both commands

version: "3"
services:

  ui:
    image: aimstack/aim:latest
    container_name: aim_ui
    restart: unless-stopped
    command: up --host=0.0.0.0 --repo=/opt/aim
    volumes:
      - /volume1/docker/aim:/opt/aim
    ports:
      - 43800:43800
    networks:
      - aim

  server:
    image: aimstack/aim:latest
    container_name: aim_server
    restart: unless-stopped
    command: server --repo=/opt/aim
    ports:
      - 53800:53800
    volumes:
      - /volume1/docker/aim:/opt/aim
    networks:
      - aim

networks:
  aim:
    driver: bridge

EDIT After manually adding an .aim directory inside my /volume1/docker/aim directory, the error message changed to:

'/opt/aim' requires upgrade. Do you want to run upgrade automatically? [y/N]: 

I'm guessing I need to find a way to call aim init inside the docker container? I would assume the aim-docker image would take care of this somehow?

vanhumbeecka avatar Feb 18 '23 20:02 vanhumbeecka

I've updated my docker-compose.yml file to the one below. The .aim repo is now initiatiazed through docker, but still no luck getting it working as I'm now getting cryptic python errors in the aim-ui

docker-compose file:

version: "3"
services:

  init:
    image: aimstack/aim:latest
    container_name: aim_init
    restart: "no"
    command: init
    volumes:
      - /volume1/docker/aim:/opt/aim
    networks:
      - aim

  ui:
    image: aimstack/aim:latest
    container_name: aim_ui
    restart: on-failure
    command: up --host=0.0.0.0
    volumes:
      - /volume1/docker/aim:/opt/aim
    ports:
      - 43800:43800
    networks:
      - aim
    depends_on:
      - init

  server:
    image: aimstack/aim:latest
    container_name: aim_server
    restart: on-failure
    command: server
    ports:
      - 53800:53800
    volumes:
      - /volume1/docker/aim:/opt/aim
    networks:
      - aim
    depends_on:
      - init
**errors in aim-ui logs**
ERROR:    Exception in ASGI application

Traceback (most recent call last):

  File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi

    result = await app(  # type: ignore[func-returns-value]

  File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__

    return await self.app(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in __call__

    await super().__call__(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__

    await self.middleware_stack(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__

    raise exc

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__

    await self.app(scope, receive, _send)

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 84, in __call__

    await self.app(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__

    raise exc

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__

    await self.app(scope, receive, sender)

  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__

    raise e

  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__

    await self.app(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__

    await route.handle(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 443, in handle

    await self.app(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in __call__

    await super().__call__(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__

    await self.middleware_stack(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__

    raise exc

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__

    await self.app(scope, receive, _send)

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/gzip.py", line 24, in __call__

    await responder(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/gzip.py", line 44, in __call__

    await self.app(scope, receive, self.send_with_gzip)

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__

    raise exc

  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__

    await self.app(scope, receive, sender)

  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__

    raise e

  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__

    await self.app(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__

    await route.handle(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle

    await self.app(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 69, in app

    await response(scope, receive, send)

  File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 273, in __call__

    await wrap(partial(self.listen_for_disconnect, receive))

  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__

    raise exceptions[0]

  File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 269, in wrap

    await func()

  File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 258, in stream_response

    async for chunk in self.body_iterator:

  File "/usr/local/lib/python3.9/site-packages/aim/web/api/runs/utils.py", line 259, in run_search_result_streamer

    run_dict[run.hash]['traces'] = run.collect_sequence_info(sequence_types='metric')

  File "/usr/local/lib/python3.9/site-packages/aim/sdk/run.py", line 661, in collect_sequence_info

    ctx_dict = self.idx_to_ctx(idx).to_dict()

  File "/usr/local/lib/python3.9/site-packages/aim/sdk/run.py", line 335, in idx_to_ctx

    return self._tracker.idx_to_ctx(idx)

  File "/usr/local/lib/python3.9/site-packages/aim/sdk/tracker.py", line 80, in idx_to_ctx

    ctx = Context(self.meta_tree['contexts', idx])

  File "aim/storage/treeview.py", line 51, in aim.storage.treeview.TreeView.__getitem__

  File "aim/storage/containertreeview.py", line 73, in aim.storage.containertreeview.ContainerTreeView.collect

KeyError: "No key ('contexts', -4504863053055089774) is present."

vanhumbeecka avatar Feb 18 '23 20:02 vanhumbeecka

@vanhumbeecka Right steps, you need to init aim directory.

I can guess that the problem is that it's not inited still when aim_ui and aim_server started. Because with the depends_on instruction docker-compose is waiting only for the moment when container is run. But aim also requires a couple of seconds to init directory.

My suggestion, you can manually add some sleep instructions to ui and server containers, e.g.

    entrypoint: >
      /bin/sh -c "
        /bin/sleep 5;
        aim server;
      "

feldlime avatar Feb 19 '23 03:02 feldlime

Thanks for the catch @feldlime adding --host 0.0.0.0 was the key! I have tested and the below docker-compose has worked for me both on local & on a remote machine. @vanhumbeecka You have to add command: server --host 0.0.0.0 to the server aswell. and make sure the directory you are mounting in the volumes exists. It is ~/aim/training_logs for the example below. Also check the versions match between your client(python that is logging runs) and the server(docker image).

version: "3"

services:
  ui:
    image: aimstack/aim:3.16.0
    container_name: aim_ui
    restart: unless-stopped
    command: up --host 0.0.0.0
    ports:
      - 43800:43800
    volumes:
    - ~/aim/training_logs:/opt/aim
    networks:
      - aim

  server:
    image: aimstack/aim:3.16.0
    container_name: aim_server
    restart: unless-stopped
    command: server --host 0.0.0.0
    ports:
      - 53800:53800
    volumes:
    - ~/aim/training_logs:/opt/aim
    networks:
      - aim

networks:
  aim:
    driver: bridge

Here is a fake run to test remote connection:

from aim import Run

aim_run = Run(repo='aim://[remote_ip]:53800',
            experiment="docker_remote_test")  # replace example IP with your tracking server IP/hostname

# Log run parameters
aim_run['params'] = {
    'learning_rate': 0.001,
    'batch_size': 32,
}

aim_run.track(5, name='loss', epoch=0,
                          context={'subset':'train'})
aim_run.track(4, name='loss', epoch=1,
                          context={'subset':'train'})
aim_run.track(3, name='loss', epoch=2,
                          context={'subset':'train'})
aim_run.track(2, name='loss', epoch=3,
                          context={'subset':'train'})

cceyda avatar Feb 22 '23 10:02 cceyda

@cceyda I haven't added the --host 0.0.0.0 to the server explicitly, because it's the default host when running with server, so that part is fine.

@feldlime I actually got it working by splitting the 2 out (init step from server and ui). That way I don't need to deal with timeouts and just do it manually. After initializing through docker with init, and later starting the compose file with server and ui, everything works and I don't see those weird python errors anymore. So it seems initializing first (and not at the same time as booting up server and ui) was the trick - otherwise the data could become corrupted. Thanks for all the feedback and help!

vanhumbeecka avatar Feb 25 '23 12:02 vanhumbeecka