PySyft
PySyft copied to clipboard
RuntimeError in Model Centric MNIST example (mcfl_create_plan.ipynb)
Question
RuntimeError found in Model Centric MNIST Example.
Further Information
Run Step2 @make_plan
part in code it returns DEBUG logger
and RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
.
Screenshots
The full trace back screenshot is like this:
System Information
- OS: Ubuntu 20.x
- Env: Conda
- Language Version: [Python 3.8, PySyft v0.5rc1, Pytorch 1.8]
Additional Context
I tried to resolve this by following a solution from a similar question in stack overflow by adding requires_grad=True
into the training plan but didn't solve this problem. I think this may because the ys=th.randint(0, 10, [64 * 3, 10]),
is default requires_grad=False
as it is not float type. Also in training plan, it sets no_grad()
:
def sgd_step(model, lr=0.1):
with ROOT_CLIENT.torch.no_grad():
for p in model.parameters():
p.data = p.data - lr * p.grad
p.grad = th.zeros_like(p.grad.get())
The DEBUG logger
looks ok, so I'm not sure whether this error will affect the overall model training. Appreciate if anyone can solve this :)
Add another question about how to host the grid network
I met the ConnectionRefusedError: [Errno 111] Connection refused
error when I attempt to use ModelCentricFLClient
to set client and host a network via syft.grid.client.client.connect()
. The screenshot looks like this:
I tried to set the client manually following a tutorial in the
./PySyft/examples/pygrid/homomorphic-encryption/
but it also returns
ConnectionError: HTTPConnectionPool(host='localhost', port=7000): Max retries exceeded with url: /users/login (Caused by
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd48657af40>: Failed to establish a new connection:
[Errno 111] Connection refused'))
I'm not sure if I need to run Pygrid in a docker container to host the network, or I can simply run it in the notebook.
Hi @LaplaceZhang
Let's start with your first problem, which is @make_plan
.
I couldn't reproduce it using fresh Ubuntu 20 + conda + py3.8 environment.
For the sake of reproducibility, I've made following Dockerfile that installs pysyft 0.5.0rc1 in conda py3.8 environment:
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y git curl build-essential
RUN curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
RUN bash ./Miniconda3-latest-Linux-x86_64.sh -b
ENV PATH=$PATH:/root/miniconda3/bin
RUN conda create -n syft -y python=3.8
WORKDIR /openmined/pysyft
RUN git clone https://github.com/openmined/pysyft .
RUN git checkout 0.5.0rc1
SHELL ["conda", "run", "-n", "syft", "/bin/sh", "-c"]
RUN pip install jupyter notebook
RUN pip install -Ur requirements.txt
RUN pip install -e .
Now if I build this image (docker build -t bug .
) and then execute the notebook in it:
docker run --rm bug conda run -n syft jupyter nbconvert --to notebook --execute examples/federated-learning/model-centric/mcfl_create_plan.ipynb
I see that it successfully passes the cell where the plan is built and fails on grid connect.
So perhaps there's a problem with your environment?
Could you post pip freeze
?
Hi, @vvmnnnkv
Yes, the first problem got fixed and as for the second one is because I didn't start a PyGrid service before trying to connect it.
However, after I managed to launch a domain via PyGrid manually (use this command sh run.sh --port 7000 --name bob --start_local_db
), the function in federated created plan
response = grid.host_federated_training(
model=local_model,
client_plans={"training_plan": train},
client_protocols={},
server_averaging_plan=avg_plan,
client_config=client_config,
server_config=server_config,
)
returns error BrokenPipeError: [Errno 32] Broken pipe
, which I think is because the web socket services is closed. The terminal looks like this:
Debug records:
[2021-05-08 15:37:11]: 134214 DEBUG Initializing WebSocket
[2021-05-08 15:37:11]: 134214 DEBUG Validating WebSocket request
[2021-05-08 15:37:11]: 134214 DEBUG Attempting to upgrade connection
[2021-05-08 15:37:11]: 134214 DEBUG WebSocket request accepted, switching protocols
[2021-05-08 15:37:11]: 134214 DEBUG Closed WebSocket
[2021-05-08 15:37:11]: 134214 DEBUG Failed to write closing frame -> closing socket
[2021-05-08 15:37:11]: 134214 DEBUG Closed WebSocket
And if I connected this address in a browser, it says {"error": "This app is in sleep mode. Please undergo the initial setup first"}
. I tried to search this error online but not much useful information I got. Seems BrokenPipeError: [Errno 32] Broken pipe
is usually due to pytorch, but I think this may be related to pygrid when starting a web socket service. I'm not familiar with web socket or socket, so I'm not quite sure where it went wrong.
Hi I have the exact same error with the WebSocket, Did you manage to solve this problem?
I didn't. I ended up using Socket + Pickle to build one manually. Also, I followed this demo to learn how to use Socket to host virtual network services. I hope this helps you. 😃
The issue BrokenPipeError: [Errno 32] Broken pipe
comes from the fact that the notebook mcfl_create_plan does not completely describe the initialisation required for the grid (the Step 4.1 section in the notebook).
What is currently worked for me (version 0.5.0) is:
- use PyGrid from PySyft repo (at https://github.com/OpenMined/PySyft/tree/dev/packages/grid/apps/domain)
- start using: "APP_ENV=dev LOCAL_DATABASE=True PORT=7000 ./run.sh"
- invoke once the code below (to setup the domain, inspired from https://github.com/OpenMined/PySyft/blob/dev/packages/syft/examples/pygrid/tutorials/Getting%20Started.ipynb)
- if you want to run completely mcfl_create_plan check also https://github.com/OpenMined/PySyft/pull/5520 there are some fixes there that are not yet merged
from syft.grid.client.client import connect
from syft.grid.client.grid_connection import (GridHTTPConnection,)
domain = connect(
url="http://localhost:7000",
conn_type=GridHTTPConnection,
)
domain.setup(
email="[email protected]",
password="owerpwd",
domain_name="OpenMined Node",
token="9G9MJ06OQH",
)
0.5 is no longer supported.