PySyft icon indicating copy to clipboard operation
PySyft copied to clipboard

RuntimeError in Model Centric MNIST example (mcfl_create_plan.ipynb)

Open LaplaceZhang opened this issue 3 years ago • 6 comments

Question

RuntimeError found in Model Centric MNIST Example.

Further Information

Run Step2 @make_plan part in code it returns DEBUG logger and RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

Screenshots

The full trace back screenshot is like this: img FullTraceback

System Information

  • OS: Ubuntu 20.x
  • Env: Conda
  • Language Version: [Python 3.8, PySyft v0.5rc1, Pytorch 1.8]

Additional Context

I tried to resolve this by following a solution from a similar question in stack overflow by adding requires_grad=True into the training plan but didn't solve this problem. I think this may because the ys=th.randint(0, 10, [64 * 3, 10]), is default requires_grad=False as it is not float type. Also in training plan, it sets no_grad():

 def sgd_step(model, lr=0.1):
     with ROOT_CLIENT.torch.no_grad():
         for p in model.parameters():
             p.data = p.data - lr * p.grad
             p.grad = th.zeros_like(p.grad.get())

The DEBUG logger looks ok, so I'm not sure whether this error will affect the overall model training. Appreciate if anyone can solve this :)

LaplaceZhang avatar Apr 27 '21 21:04 LaplaceZhang

Add another question about how to host the grid network

I met the ConnectionRefusedError: [Errno 111] Connection refused error when I attempt to use ModelCentricFLClient to set client and host a network via syft.grid.client.client.connect(). The screenshot looks like this: 2021-04-28 22-14-29 的屏幕截图 I tried to set the client manually following a tutorial in the ./PySyft/examples/pygrid/homomorphic-encryption/ but it also returns

  ConnectionError: HTTPConnectionPool(host='localhost', port=7000): Max retries exceeded with url: /users/login (Caused by 
  NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd48657af40>: Failed to establish a new connection: 
  [Errno 111] Connection refused'))

I'm not sure if I need to run Pygrid in a docker container to host the network, or I can simply run it in the notebook.

LaplaceZhang avatar Apr 28 '21 21:04 LaplaceZhang

Hi @LaplaceZhang Let's start with your first problem, which is @make_plan. I couldn't reproduce it using fresh Ubuntu 20 + conda + py3.8 environment.

For the sake of reproducibility, I've made following Dockerfile that installs pysyft 0.5.0rc1 in conda py3.8 environment:

FROM ubuntu:20.04

RUN apt-get update && apt-get install -y git curl build-essential

RUN curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
RUN bash ./Miniconda3-latest-Linux-x86_64.sh -b
ENV PATH=$PATH:/root/miniconda3/bin
RUN conda create -n syft -y python=3.8

WORKDIR /openmined/pysyft
RUN git clone https://github.com/openmined/pysyft .
RUN git checkout 0.5.0rc1

SHELL ["conda", "run", "-n", "syft", "/bin/sh", "-c"]
RUN pip install jupyter notebook
RUN pip install -Ur requirements.txt
RUN pip install -e .

Now if I build this image (docker build -t bug .) and then execute the notebook in it: docker run --rm bug conda run -n syft jupyter nbconvert --to notebook --execute examples/federated-learning/model-centric/mcfl_create_plan.ipynb I see that it successfully passes the cell where the plan is built and fails on grid connect.

So perhaps there's a problem with your environment? Could you post pip freeze?

vvmnnnkv avatar May 05 '21 10:05 vvmnnnkv

Hi, @vvmnnnkv

Yes, the first problem got fixed and as for the second one is because I didn't start a PyGrid service before trying to connect it.

However, after I managed to launch a domain via PyGrid manually (use this command sh run.sh --port 7000 --name bob --start_local_db), the function in federated created plan

 response = grid.host_federated_training(
     model=local_model,
     client_plans={"training_plan": train},
     client_protocols={},
     server_averaging_plan=avg_plan,
     client_config=client_config,
     server_config=server_config,
 )

returns error BrokenPipeError: [Errno 32] Broken pipe, which I think is because the web socket services is closed. The terminal looks like this:

2021-05-08 15-37-58 的屏幕截图

Debug records:

 [2021-05-08 15:37:11]: 134214 DEBUG Initializing WebSocket
 [2021-05-08 15:37:11]: 134214 DEBUG Validating WebSocket request
 [2021-05-08 15:37:11]: 134214 DEBUG Attempting to upgrade connection
 [2021-05-08 15:37:11]: 134214 DEBUG WebSocket request accepted, switching protocols
 [2021-05-08 15:37:11]: 134214 DEBUG Closed WebSocket
 [2021-05-08 15:37:11]: 134214 DEBUG Failed to write closing frame -> closing socket
 [2021-05-08 15:37:11]: 134214 DEBUG Closed WebSocket

And if I connected this address in a browser, it says {"error": "This app is in sleep mode. Please undergo the initial setup first"}. I tried to search this error online but not much useful information I got. Seems BrokenPipeError: [Errno 32] Broken pipe is usually due to pytorch, but I think this may be related to pygrid when starting a web socket service. I'm not familiar with web socket or socket, so I'm not quite sure where it went wrong.

LaplaceZhang avatar May 08 '21 15:05 LaplaceZhang

Hi I have the exact same error with the WebSocket, Did you manage to solve this problem?

BaptisteTomasin avatar Jun 08 '21 07:06 BaptisteTomasin

I didn't. I ended up using Socket + Pickle to build one manually. Also, I followed this demo to learn how to use Socket to host virtual network services. I hope this helps you. 😃

LaplaceZhang avatar Jun 08 '21 16:06 LaplaceZhang

The issue BrokenPipeError: [Errno 32] Broken pipe comes from the fact that the notebook mcfl_create_plan does not completely describe the initialisation required for the grid (the Step 4.1 section in the notebook).

What is currently worked for me (version 0.5.0) is:

  • use PyGrid from PySyft repo (at https://github.com/OpenMined/PySyft/tree/dev/packages/grid/apps/domain)
  • start using: "APP_ENV=dev LOCAL_DATABASE=True PORT=7000 ./run.sh"
  • invoke once the code below (to setup the domain, inspired from https://github.com/OpenMined/PySyft/blob/dev/packages/syft/examples/pygrid/tutorials/Getting%20Started.ipynb)
  • if you want to run completely mcfl_create_plan check also https://github.com/OpenMined/PySyft/pull/5520 there are some fixes there that are not yet merged
from syft.grid.client.client import connect
from syft.grid.client.grid_connection import (GridHTTPConnection,) 
domain = connect(
    url="http://localhost:7000", 
    conn_type=GridHTTPConnection,
)

domain.setup(
    email="[email protected]",
    password="owerpwd",
    domain_name="OpenMined Node",
    token="9G9MJ06OQH",
)

vladmihaisima avatar Jul 15 '21 14:07 vladmihaisima

0.5 is no longer supported.

madhavajay avatar Nov 17 '22 07:11 madhavajay