PySyft icon indicating copy to clipboard operation
PySyft copied to clipboard

hagrid 0.1.9 fails at starting a node on a Raspberry Pi

Open Rene36 opened this issue 3 years ago • 3 comments

Description

After running hagrid launch local_node domain on a Raspberry Pi 4 with an ARM architecture the docker build fails at the tailscale container. Tailscale is responsible for setting up a VPN service. It gives the following error:

 => [openmined/grid-vpn-tailscale:0.6.0 stage-0 1/7] FROM docker.io/shaynesweeney/tailscale:l  10.0s
 => => resolve docker.io/shaynesweeney/tailscale:latest@sha256:b7fd6dcb59b54630b0acb1998d5a235  0.1s
 => => sha256:3748bda6edb3379fc8667761cf2bcc93d877c74c5828c694459bc2fb8a0f1f76 2.00MB / 2.00MB  0.5s
 => => sha256:b7fd6dcb59b54630b0acb1998d5a235e3e6d958a32f47835fd53935044d74bea 951B / 951B      0.0s
 => => sha256:138964341e09acd46a587faff53fbe04446989e2e459b5f1a5088ebe1864c322 1.92kB / 1.92kB  0.0s
 => => sha256:e4b4f71606ce42fa39c23f7cb94b88d454c762b7d761d064525ccff4ab6290 15.33MB / 15.33MB  1.7s
 => => sha256:a0d0a0d46f8b52473982a3c466318f479767577551a53ffc9074c9fa7035982e 2.81MB / 2.81MB  0.8s
 => => extracting sha256:a0d0a0d46f8b52473982a3c466318f479767577551a53ffc9074c9fa7035982e       0.5s
 => => extracting sha256:3748bda6edb3379fc8667761cf2bcc93d877c74c5828c694459bc2fb8a0f1f76       0.6s
 => => extracting sha256:e4b4f71606ce42fa39c23f7cb94b88d454c762b7d761d064525ccff4ab629058       1.4s
 => [openmined/grid-backend:0.6.0 internal] load build context                                  0.6s
 => => transferring context: 2.71MB                                                             0.5s
 => CANCELED [openmined/grid-backend:0.6.0 build 1/9] FROM docker.io/library/python:3.9.9-sli  20.9s
 => => resolve docker.io/library/python:3.9.9-slim@sha256:d67e4b3e185208a010e0d06dd1f655292dd9  0.1s
 => => sha256:d67e4b3e185208a010e0d06dd1f655292dd92c5dd08a64b1c59a0acbd387b1e9 1.86kB / 1.86kB  0.0s
 => => sha256:f24ca6962ecb3d115207094b4b5b9e216f4502e6a5d50990b03d8798b3a07561 1.37kB / 1.37kB  0.0s
 => => sha256:0e6609df29e4aebc8716fd168776bb538ead592c681a3c9cb19f55287e65a31d 7.89kB / 7.89kB  0.0s
 => => sha256:968621624b326084ed82349252b333e649eaab39f71866edb2b9a4f8472836 30.06MB / 30.06MB  7.9s
 => => sha256:344249d09d750370bbe46685f9b5f83cf4de8efd1dfe2ac675fadd18fe57 859.08kB / 859.08kB  1.0s
 => => sha256:4b1766cb00dac296fd5f7bd412ccac6dc8dac1afb8e399a64fc15139a47c10 11.01MB / 11.01MB  1.9s
 => => sha256:480c574b27605460c496753e26f5a9f40c5e45a0d9a6ac78d39cc213c38cb2c2 234B / 234B      2.0s
 => => sha256:55150497e6cf0cb823b71c6dfc51c1fa4d230aef1253db8e65ef3857b5853b98 2.43MB / 2.43MB  8.0s
 => => extracting sha256:968621624b326084ed82349252b333e649eaab39f71866edb2b9a4f847283680       4.7s
 => => extracting sha256:344249d09d750370bbe46685f9b5f83cf4de8efd1dfe2ac675fadd18fe57fd18       0.3s
 => => extracting sha256:4b1766cb00dac296fd5f7bd412ccac6dc8dac1afb8e399a64fc15139a47c10a6       0.4s
 => ERROR [openmined/grid-vpn-tailscale:0.6.0 stage-0 2/7] RUN --mount=type=cache,target=/var  10.8s
------                                                                                               
 > [openmined/grid-vpn-tailscale:0.6.0 stage-0 2/7] RUN --mount=type=cache,target=/var/cache/apk     apk add --no-cache python3 py3-pip ca-certificates:
#8 0.781 standard_init_linux.go:228: exec user process caused: exec format error
------
failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c apk add --no-cache python3 py3-pip ca-certificates]: exit code: 1

How to Reproduce

  1. Run hagrid launch local_node domain on a Raspberry Pi 4 (with 64bit version of Ubuntu)

Expected Behavior

If you run hagrid launch local_node domain on a Raspberry Pi 4 (with 64bit version of Ubuntu) it should start a domain node.

System Information

  • OS: Ubuntu 64bit
  • OS Version: 20.10
  • Language Version: Python 3.8.10
  • Package Manager Version: Pip 21.3.1

Additional Context

  • uname -m gives an aarch64 architecture.
  • Tailscale is available for Raspberry Pis (see here at the bottom).
  • The docker file used in hagrid is using FROM shaynesweeney/tailscale:latest as its base image link, which uses FROM golang:1.17-alpine AS build-env as its base image link.
  • The image golang:1.17-alpine supports linux/arm64/v8 link
  • See below the output of hagrid debug
{"datetime": "07/12/2021 08:25:37 UTC", "python_binary": "/home/ubuntu/venvs/fl60/bin/python3.8", "dependencies": {"docker": "/usr/bin/docker", "git": "/usr/bin/git", "ansible-playbook": null}, "environment": {"uname": ["Linux", "raspi04", "5.8.0-1032-raspi", "#35-Ubuntu SMP PREEMPT Wed Jul 14 10:51:21 UTC 2021", "aarch64", "aarch64"], "platform": "linux", "os_version": "5.8.0-1032-raspi", "python_version": "3.8.10"}, "hagrid": "0.1.9", "hagrid_dev": false, "hagrid_path": "/home/ubuntu/venvs/fl60/lib/python3.8/site-packages", "hagrid_repo_sha": "5549dd238995c098aec44ddf13c17dc5dc889fa9", "docker": "Client:\n Context:    default\n Debug Mode: false\n Plugins:\n  app: Docker App (Docker Inc., v0.9.1-beta3)\n  buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)\n  compose: Docker Compose (Docker Inc., v2.1.1)\n\nServer:\n Containers: 0\n  Running: 0\n  Paused: 0\n  Stopped: 0\n Images: 0\n Server Version: 20.10.10\n Storage Driver: overlay2\n  Backing Filesystem: extfs\n  Supports d_type: true\n  Native Overlay Diff: true\n  userxattr: false\n Logging Driver: json-file\n Cgroup Driver: cgroupfs\n Cgroup Version: 1\n Plugins:\n  Volume: local\n  Network: bridge host ipvlan macvlan null overlay\n  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog\n Swarm: inactive\n Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc\n Default Runtime: runc\n Init Binary: docker-init\n containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8\n runc version: v1.0.2-0-g52b36a2\n init version: de40ad0\n Security Options:\n  apparmor\n  seccomp\n   Profile: default\n Kernel Version: 5.8.0-1032-raspi\n Operating System: Ubuntu 20.10\n OSType: linux\n Architecture: aarch64\n CPUs: 4\n Total Memory: 3.704GiB\n Name: raspi04\n ID: 7SCX:6WWY:YMWD:4WJ6:2IM4:LVYM:T5ER:5O3H:FP3E:IATL:4KTT:DP6T\n Docker Root Dir: /var/lib/docker\n Debug Mode: false\n Registry: https://index.docker.io/v1/\n Labels:\n Experimental: false\n Insecure Registries:\n  127.0.0.0/8\n Live Restore Enabled: false\n\n"}

Rene36 avatar Dec 07 '21 08:12 Rene36

Hi @Rene36 , could you try with the latest 0.6.0 branch and post regarding the status of the error.

rasswanth-s avatar Jan 10 '22 09:01 rasswanth-s

With some workarounds I fixed the issue for a fresh Ubuntu 20.04 install, PySyft v0.6.0 and hagrid v0.2.0. I fixed the CPU architecture issue by changing two dockerfiles. I tried the setup with two jupyter notebooks and I can connect to the domain node on the Raspberry Pi. However, I did not improve for efficiency.

  1. Find the path of the installed hagrid package.
  2. Open lib.py and comment out update_repo(repo=GIT_REPO, branch=repo_branch) to avoid that our changes to the dockerfiles are over written.

The used tailscale image (shaynesweeney/tailscale:latest) in /hagrid/PySyft/packages/grid/vpn/tailscale.dockerfile and the waitforit function in hagrid/PySyft/packages/grid/backend/backend.dockerfile do not support ARM 64 bit (aarch64). Therefore, I replaced it with the original tailscale image and built the waitforit function from source. The resulting dockerfiles are at the bottom of this post.

The last release from waitforit is almost 4 years old. Therefore, I recommend to replace it with something more current.

I logged into the remote Raspberry Pi domain via

domain = sy.login(email="[email protected]",
                              password="changethis",
                              url=<ip_address>,
                              port=8081)

hagrid/PySyft/packages/grid/backend/backend.dockerfile

#FROM python:3.9.9-slim as build
FROM ubuntu:latest as build

RUN apt-get update && apt-get upgrade -y

# Download and build waitforit from source
RUN DEBIAN_FRONTEND=noninteractive apt-get install git golang -y
RUN git clone https://github.com/maxcnunes/waitforit
WORKDIR waitforit
RUN go build

RUN cp waitforit /usr/local/bin/waitforit

RUN --mount=type=cache,target=/var/cache/apt
RUN apt-get install -y --no-install-recommends curl python3-dev gcc make

WORKDIR /app
COPY grid/backend/requirements.txt /app

RUN apt-get install -y python3-pip
RUN apt-get install -y python3-dev libpq-dev  # pscopg2 requirement
# Allow installing dev dependencies to run tests
RUN --mount=type=cache,target=/root/.cache
RUN pip install --user "uvicorn[standard]" gunicorn

RUN if [ $(uname -m) = "x86_64" ]; then \
  pip install --user torch==1.10.0+cpu -f https://download.pytorch.org/whl/torch_stable.html; \
  fi

# apple m1 build PyNaCl for aarch64
RUN if [ $(uname -m) != "x86_64" ]; then \
  pip install --user PyNaCl; \
  pip install --user torch==1.10.0 -f https://download.pytorch.org/whl/torch_stable.html; \
  fi

RUN --mount=type=cache,target=/root/.cache
RUN pip install --user -r requirements.txt

# allow container to wait for other services
#ENV WAITFORIT_VERSION="v2.4.1"
#COPY grid/backend/waitforit /usr/local/bin/waitforit
#RUN curl -o /usr/local/bin/waitforit -sSL https://github.com/maxcnunes/waitforit/releases/download/>
#  chmod +x /usr/local/bin/waitforit

# Backend
FROM python:3.9.9-slim as backend
ENV PYTHONPATH=/app
ENV PATH=/root/.local/bin:$PATH

# copy start scripts and gunicorn conf
COPY grid/backend/docker-scripts/start.sh /start.sh
COPY grid/backend/docker-scripts/gunicorn_conf.py /gunicorn_conf.py
COPY grid/backend/docker-scripts/start-reload.sh /start-reload.sh
COPY grid/backend/worker-start.sh /worker-start.sh
COPY grid/backend/worker-start-reload.sh /worker-start-reload.sh

RUN chmod +x /start.sh
RUN chmod +x /start-reload.sh
RUN chmod +x /worker-start.sh
RUN chmod +x /worker-start-reload.sh

COPY --from=build /root/.local /root/.local
COPY --from=build /usr/local/bin/waitforit /usr/local/bin/waitforit

#RUN --mount=type=cache,target=/root/.cache
# ---------------------------------------------------------
RUN apt-get update
RUN apt-get install -y python3-dev libpq-dev  # pscopg2 requirement
RUN apt-get update && apt-get install -y libpython3-dev build-essential
WORKDIR /app
COPY grid/backend/requirements.txt .
RUN pip install --user -r requirements.txt

RUN pip install --user tenacity configparser  # ModuleNotFoundError for psycopg2
RUN pip install --user watchdog pyyaml argh psycopg2
# ---------------------------------------------------------

# copy grid
COPY grid/backend /app/

# copy syft
# until we have stable releases make sure to install syft
COPY syft/setup.py /app/syft/setup.py
COPY syft/setup.cfg /app/syft/setup.cfg
COPY syft/src /app/syft/src

# install syft
RUN --mount=type=cache,target=/root/.cache
RUN pip install --user -e /app/syft

# change to worker-start.sh or start-reload.sh as needed
CMD ["bash", "start.sh"]

hagrid/PySyft/packages/grid/vpn/tailscale.dockerfile

# FROM shaynesweeney/tailscale:latest
FROM tailscale/tailscale:latest

RUN --mount=type=cache,target=/var/cache/apk

# see https://github.com/alpine-docker/git/issues/35
RUN apk update && apk upgrade
RUN apk fix
RUN apk add --no-cache python3 py3-pip ca-certificates

WORKDIR /tailscale
COPY ./requirements.txt /tailscale/requirements.txt
RUN --mount=type=cache,target=/root/.cache
RUN pip install --user -r requirements.txt

COPY ./tailscale.sh /tailscale/tailscale.sh
COPY ./tailscale.py /tailscale/tailscale.py

ENV HOSTNAME="node"

CMD ["sh", "-c", "/tailscale/tailscale.sh ${HOSTNAME}"]

Rene36 avatar Jan 14 '22 14:01 Rene36

Hi @Rene36 we have actually added support for arm64 linux just recently and it runs in CI on our nightlies. HAGrid now supports an extra command --platform linux/arm64 which will get passed to docker. Can you confirm if this works on Raspberry Pi?

Also please make sure to update to the latest 0.7.0 beta releases.

$ pip install hagrid
$ hagrid launch domain to docker:8081 --tag=latest --platform=linux/arm64

madhavajay avatar Jun 08 '22 06:06 madhavajay

I am closing this due to no response. arm64 linux builds have been available for a while now and are tested in CI.

madhavajay avatar Sep 14 '22 05:09 madhavajay