PySyft
PySyft copied to clipboard
hagrid 0.1.9 fails at starting a node on a Raspberry Pi
Description
After running hagrid launch local_node domain
on a Raspberry Pi 4 with an ARM architecture the docker build fails at the tailscale container. Tailscale is responsible for setting up a VPN service. It gives the following error:
=> [openmined/grid-vpn-tailscale:0.6.0 stage-0 1/7] FROM docker.io/shaynesweeney/tailscale:l 10.0s
=> => resolve docker.io/shaynesweeney/tailscale:latest@sha256:b7fd6dcb59b54630b0acb1998d5a235 0.1s
=> => sha256:3748bda6edb3379fc8667761cf2bcc93d877c74c5828c694459bc2fb8a0f1f76 2.00MB / 2.00MB 0.5s
=> => sha256:b7fd6dcb59b54630b0acb1998d5a235e3e6d958a32f47835fd53935044d74bea 951B / 951B 0.0s
=> => sha256:138964341e09acd46a587faff53fbe04446989e2e459b5f1a5088ebe1864c322 1.92kB / 1.92kB 0.0s
=> => sha256:e4b4f71606ce42fa39c23f7cb94b88d454c762b7d761d064525ccff4ab6290 15.33MB / 15.33MB 1.7s
=> => sha256:a0d0a0d46f8b52473982a3c466318f479767577551a53ffc9074c9fa7035982e 2.81MB / 2.81MB 0.8s
=> => extracting sha256:a0d0a0d46f8b52473982a3c466318f479767577551a53ffc9074c9fa7035982e 0.5s
=> => extracting sha256:3748bda6edb3379fc8667761cf2bcc93d877c74c5828c694459bc2fb8a0f1f76 0.6s
=> => extracting sha256:e4b4f71606ce42fa39c23f7cb94b88d454c762b7d761d064525ccff4ab629058 1.4s
=> [openmined/grid-backend:0.6.0 internal] load build context 0.6s
=> => transferring context: 2.71MB 0.5s
=> CANCELED [openmined/grid-backend:0.6.0 build 1/9] FROM docker.io/library/python:3.9.9-sli 20.9s
=> => resolve docker.io/library/python:3.9.9-slim@sha256:d67e4b3e185208a010e0d06dd1f655292dd9 0.1s
=> => sha256:d67e4b3e185208a010e0d06dd1f655292dd92c5dd08a64b1c59a0acbd387b1e9 1.86kB / 1.86kB 0.0s
=> => sha256:f24ca6962ecb3d115207094b4b5b9e216f4502e6a5d50990b03d8798b3a07561 1.37kB / 1.37kB 0.0s
=> => sha256:0e6609df29e4aebc8716fd168776bb538ead592c681a3c9cb19f55287e65a31d 7.89kB / 7.89kB 0.0s
=> => sha256:968621624b326084ed82349252b333e649eaab39f71866edb2b9a4f8472836 30.06MB / 30.06MB 7.9s
=> => sha256:344249d09d750370bbe46685f9b5f83cf4de8efd1dfe2ac675fadd18fe57 859.08kB / 859.08kB 1.0s
=> => sha256:4b1766cb00dac296fd5f7bd412ccac6dc8dac1afb8e399a64fc15139a47c10 11.01MB / 11.01MB 1.9s
=> => sha256:480c574b27605460c496753e26f5a9f40c5e45a0d9a6ac78d39cc213c38cb2c2 234B / 234B 2.0s
=> => sha256:55150497e6cf0cb823b71c6dfc51c1fa4d230aef1253db8e65ef3857b5853b98 2.43MB / 2.43MB 8.0s
=> => extracting sha256:968621624b326084ed82349252b333e649eaab39f71866edb2b9a4f847283680 4.7s
=> => extracting sha256:344249d09d750370bbe46685f9b5f83cf4de8efd1dfe2ac675fadd18fe57fd18 0.3s
=> => extracting sha256:4b1766cb00dac296fd5f7bd412ccac6dc8dac1afb8e399a64fc15139a47c10a6 0.4s
=> ERROR [openmined/grid-vpn-tailscale:0.6.0 stage-0 2/7] RUN --mount=type=cache,target=/var 10.8s
------
> [openmined/grid-vpn-tailscale:0.6.0 stage-0 2/7] RUN --mount=type=cache,target=/var/cache/apk apk add --no-cache python3 py3-pip ca-certificates:
#8 0.781 standard_init_linux.go:228: exec user process caused: exec format error
------
failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c apk add --no-cache python3 py3-pip ca-certificates]: exit code: 1
How to Reproduce
- Run
hagrid launch local_node domain
on a Raspberry Pi 4 (with 64bit version of Ubuntu)
Expected Behavior
If you run hagrid launch local_node domain
on a Raspberry Pi 4 (with 64bit version of Ubuntu) it should start a domain node.
System Information
- OS: Ubuntu 64bit
- OS Version: 20.10
- Language Version: Python 3.8.10
- Package Manager Version: Pip 21.3.1
Additional Context
-
uname -m
gives an aarch64 architecture. - Tailscale is available for Raspberry Pis (see here at the bottom).
- The docker file used in hagrid is using
FROM shaynesweeney/tailscale:latest
as its base image link, which usesFROM golang:1.17-alpine AS build-env
as its base image link. - The image
golang:1.17-alpine
supports linux/arm64/v8 link - See below the output of
hagrid debug
{"datetime": "07/12/2021 08:25:37 UTC", "python_binary": "/home/ubuntu/venvs/fl60/bin/python3.8", "dependencies": {"docker": "/usr/bin/docker", "git": "/usr/bin/git", "ansible-playbook": null}, "environment": {"uname": ["Linux", "raspi04", "5.8.0-1032-raspi", "#35-Ubuntu SMP PREEMPT Wed Jul 14 10:51:21 UTC 2021", "aarch64", "aarch64"], "platform": "linux", "os_version": "5.8.0-1032-raspi", "python_version": "3.8.10"}, "hagrid": "0.1.9", "hagrid_dev": false, "hagrid_path": "/home/ubuntu/venvs/fl60/lib/python3.8/site-packages", "hagrid_repo_sha": "5549dd238995c098aec44ddf13c17dc5dc889fa9", "docker": "Client:\n Context: default\n Debug Mode: false\n Plugins:\n app: Docker App (Docker Inc., v0.9.1-beta3)\n buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)\n compose: Docker Compose (Docker Inc., v2.1.1)\n\nServer:\n Containers: 0\n Running: 0\n Paused: 0\n Stopped: 0\n Images: 0\n Server Version: 20.10.10\n Storage Driver: overlay2\n Backing Filesystem: extfs\n Supports d_type: true\n Native Overlay Diff: true\n userxattr: false\n Logging Driver: json-file\n Cgroup Driver: cgroupfs\n Cgroup Version: 1\n Plugins:\n Volume: local\n Network: bridge host ipvlan macvlan null overlay\n Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog\n Swarm: inactive\n Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc\n Default Runtime: runc\n Init Binary: docker-init\n containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8\n runc version: v1.0.2-0-g52b36a2\n init version: de40ad0\n Security Options:\n apparmor\n seccomp\n Profile: default\n Kernel Version: 5.8.0-1032-raspi\n Operating System: Ubuntu 20.10\n OSType: linux\n Architecture: aarch64\n CPUs: 4\n Total Memory: 3.704GiB\n Name: raspi04\n ID: 7SCX:6WWY:YMWD:4WJ6:2IM4:LVYM:T5ER:5O3H:FP3E:IATL:4KTT:DP6T\n Docker Root Dir: /var/lib/docker\n Debug Mode: false\n Registry: https://index.docker.io/v1/\n Labels:\n Experimental: false\n Insecure Registries:\n 127.0.0.0/8\n Live Restore Enabled: false\n\n"}
Hi @Rene36 , could you try with the latest 0.6.0 branch and post regarding the status of the error.
With some workarounds I fixed the issue for a fresh Ubuntu 20.04 install, PySyft v0.6.0 and hagrid v0.2.0. I fixed the CPU architecture issue by changing two dockerfiles. I tried the setup with two jupyter notebooks and I can connect to the domain node on the Raspberry Pi. However, I did not improve for efficiency.
- Find the path of the installed
hagrid
package. - Open
lib.py
and comment outupdate_repo(repo=GIT_REPO, branch=repo_branch)
to avoid that our changes to the dockerfiles are over written.
The used tailscale image (shaynesweeney/tailscale:latest) in /hagrid/PySyft/packages/grid/vpn/tailscale.dockerfile
and the waitforit function in hagrid/PySyft/packages/grid/backend/backend.dockerfile
do not support ARM 64 bit (aarch64). Therefore, I replaced it with the original tailscale image and built the waitforit
function from source.
The resulting dockerfiles are at the bottom of this post.
The last release from waitforit
is almost 4 years old. Therefore, I recommend to replace it with something more current.
I logged into the remote Raspberry Pi domain via
domain = sy.login(email="[email protected]",
password="changethis",
url=<ip_address>,
port=8081)
hagrid/PySyft/packages/grid/backend/backend.dockerfile
#FROM python:3.9.9-slim as build
FROM ubuntu:latest as build
RUN apt-get update && apt-get upgrade -y
# Download and build waitforit from source
RUN DEBIAN_FRONTEND=noninteractive apt-get install git golang -y
RUN git clone https://github.com/maxcnunes/waitforit
WORKDIR waitforit
RUN go build
RUN cp waitforit /usr/local/bin/waitforit
RUN --mount=type=cache,target=/var/cache/apt
RUN apt-get install -y --no-install-recommends curl python3-dev gcc make
WORKDIR /app
COPY grid/backend/requirements.txt /app
RUN apt-get install -y python3-pip
RUN apt-get install -y python3-dev libpq-dev # pscopg2 requirement
# Allow installing dev dependencies to run tests
RUN --mount=type=cache,target=/root/.cache
RUN pip install --user "uvicorn[standard]" gunicorn
RUN if [ $(uname -m) = "x86_64" ]; then \
pip install --user torch==1.10.0+cpu -f https://download.pytorch.org/whl/torch_stable.html; \
fi
# apple m1 build PyNaCl for aarch64
RUN if [ $(uname -m) != "x86_64" ]; then \
pip install --user PyNaCl; \
pip install --user torch==1.10.0 -f https://download.pytorch.org/whl/torch_stable.html; \
fi
RUN --mount=type=cache,target=/root/.cache
RUN pip install --user -r requirements.txt
# allow container to wait for other services
#ENV WAITFORIT_VERSION="v2.4.1"
#COPY grid/backend/waitforit /usr/local/bin/waitforit
#RUN curl -o /usr/local/bin/waitforit -sSL https://github.com/maxcnunes/waitforit/releases/download/>
# chmod +x /usr/local/bin/waitforit
# Backend
FROM python:3.9.9-slim as backend
ENV PYTHONPATH=/app
ENV PATH=/root/.local/bin:$PATH
# copy start scripts and gunicorn conf
COPY grid/backend/docker-scripts/start.sh /start.sh
COPY grid/backend/docker-scripts/gunicorn_conf.py /gunicorn_conf.py
COPY grid/backend/docker-scripts/start-reload.sh /start-reload.sh
COPY grid/backend/worker-start.sh /worker-start.sh
COPY grid/backend/worker-start-reload.sh /worker-start-reload.sh
RUN chmod +x /start.sh
RUN chmod +x /start-reload.sh
RUN chmod +x /worker-start.sh
RUN chmod +x /worker-start-reload.sh
COPY --from=build /root/.local /root/.local
COPY --from=build /usr/local/bin/waitforit /usr/local/bin/waitforit
#RUN --mount=type=cache,target=/root/.cache
# ---------------------------------------------------------
RUN apt-get update
RUN apt-get install -y python3-dev libpq-dev # pscopg2 requirement
RUN apt-get update && apt-get install -y libpython3-dev build-essential
WORKDIR /app
COPY grid/backend/requirements.txt .
RUN pip install --user -r requirements.txt
RUN pip install --user tenacity configparser # ModuleNotFoundError for psycopg2
RUN pip install --user watchdog pyyaml argh psycopg2
# ---------------------------------------------------------
# copy grid
COPY grid/backend /app/
# copy syft
# until we have stable releases make sure to install syft
COPY syft/setup.py /app/syft/setup.py
COPY syft/setup.cfg /app/syft/setup.cfg
COPY syft/src /app/syft/src
# install syft
RUN --mount=type=cache,target=/root/.cache
RUN pip install --user -e /app/syft
# change to worker-start.sh or start-reload.sh as needed
CMD ["bash", "start.sh"]
hagrid/PySyft/packages/grid/vpn/tailscale.dockerfile
# FROM shaynesweeney/tailscale:latest
FROM tailscale/tailscale:latest
RUN --mount=type=cache,target=/var/cache/apk
# see https://github.com/alpine-docker/git/issues/35
RUN apk update && apk upgrade
RUN apk fix
RUN apk add --no-cache python3 py3-pip ca-certificates
WORKDIR /tailscale
COPY ./requirements.txt /tailscale/requirements.txt
RUN --mount=type=cache,target=/root/.cache
RUN pip install --user -r requirements.txt
COPY ./tailscale.sh /tailscale/tailscale.sh
COPY ./tailscale.py /tailscale/tailscale.py
ENV HOSTNAME="node"
CMD ["sh", "-c", "/tailscale/tailscale.sh ${HOSTNAME}"]
Hi @Rene36 we have actually added support for arm64 linux just recently and it runs in CI on our nightlies. HAGrid now supports an extra command --platform linux/arm64
which will get passed to docker. Can you confirm if this works on Raspberry Pi?
Also please make sure to update to the latest 0.7.0 beta releases.
$ pip install hagrid
$ hagrid launch domain to docker:8081 --tag=latest --platform=linux/arm64
I am closing this due to no response. arm64 linux builds have been available for a while now and are tested in CI.