FedML icon indicating copy to clipboard operation
FedML copied to clipboard

Unable to run MNIST experiments. run_fedavg_distributed_pytorch.sh

Open xlw686 opened this issue 3 years ago • 6 comments

I am experimenting with the tutorial below

  • https://github.com/FedML-AI/FedML/blob/master/fedml_experiments/distributed/fedavg/README.md Run the following shell and dump the arguments to get.
sh run_fedavg_distributed_pytorch.sh 10 10 cnn hetero 100 1 20 0.1 femnist "./../../../data/FederatedEMNIST/datasets" sgd sgd GRPC grpc_ipconfig.csv 0

Something seems to have gone wrong,No results were obtained

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 91869 RUNNING AT VM-24-3-ubuntu
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

xlw686 avatar Apr 18 '22 12:04 xlw686

@xlw686 you set the worker number as 10, which may be out of your local machine's memory?

chaoyanghe avatar Apr 18 '22 16:04 chaoyanghe

I changed the worker number to 1,and then run the shell below:

 sh run_fedavg_distributed_pytorch.sh 1000 1 lr hetero 200 1 10 0.03 mnist "./../../../data/mnist" sgd 0

Below is an error message:

(fedml) root@VM-24-3-ubuntu:~/share/FedML/fedml_experiments/distributed/fedavg# sh run_fedavg_distributed_pytorch.sh 1000 1 lr hetero 200 1 10 0.03 mnist "./../../../data/mnist" sgd 0
2
/root/anaconda3/envs/fedml/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/root/anaconda3/envs/fedml/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/root/anaconda3/envs/fedml/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/root/anaconda3/envs/fedml/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/root/anaconda3/envs/fedml/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/root/anaconda3/envs/fedml/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/root/anaconda3/envs/fedml/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/root/anaconda3/envs/fedml/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
Traceback (most recent call last):
  File "./main_fedavg.py", line 43, in <module>
    from fedml_api.distributed.fedavg.FedAvgAPI import FedML_init, FedML_FedAvg_distributed
  File "/root/share/FedML/fedml_api/distributed/fedavg/FedAvgAPI.py", line 1, in <module>
    from mpi4py import MPI
  File "/root/share/mpi4py.py", line 1, in <module>
    from mpi4py import MPI
ImportError: cannot import name 'MPI' from 'mpi4py' (/root/share/mpi4py.py)
Traceback (most recent call last):
  File "./main_fedavg.py", line 43, in <module>
    from fedml_api.distributed.fedavg.FedAvgAPI import FedML_init, FedML_FedAvg_distributed
  File "/root/share/FedML/fedml_api/distributed/fedavg/FedAvgAPI.py", line 1, in <module>
    from mpi4py import MPI
  File "/root/share/mpi4py.py", line 1, in <module>
    from mpi4py import MPI
ImportError: cannot import name 'MPI' from 'mpi4py' (/root/share/mpi4py.py)

xlw686 avatar Apr 19 '22 12:04 xlw686

The display cannot import MPI from mpi4py, but I can do it like the following:

ImportError: cannot import name 'MPI' from 'mpi4py' (/root/share/mpi4py.py)
(fedml) root@VM-24-3-ubuntu:~/share/FedML/fedml_experiments/distributed/fedavg# python
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mpi4py
>>> from mpi4py import MPI
>>> comm = MPI.COMM_WORLD
>>> process_id = comm.Get_rank()
>>> print(process_id)
0
>>> 

I don't know what went wrong😂

xlw686 avatar Apr 19 '22 12:04 xlw686

the worker number should be at least 3.

chaoyanghe avatar Apr 19 '22 23:04 chaoyanghe

The worker number changed to 3, which is no different:

 sh run_fedavg_distributed_pytorch.sh 1000 3 lr hetero 200 1 10 0.03 mnist "./../../../data/mnist" sgd 0

Below is an error message:

Traceback (most recent call last):
  File "./main_fedavg.py", line 43, in <module>
    from fedml_api.distributed.fedavg.FedAvgAPI import FedML_init, FedML_FedAvg_distributed
  File "/root/share/FedML/fedml_api/distributed/fedavg/FedAvgAPI.py", line 1, in <module>
    from mpi4py import MPI
  File "/root/share/mpi4py.py", line 1, in <module>
    from mpi4py import MPI
ImportError: cannot import name 'MPI' from 'mpi4py' (/root/share/mpi4py.py)

the worker number should be at least 3.

xlw686 avatar Apr 20 '22 01:04 xlw686

@xlw686 is this issue solved in the latest version?

chaoyanghe avatar Aug 19 '22 17:08 chaoyanghe

@xlw686 Can you run your example using the latest dev branch?

fedml-dimitris avatar Oct 24 '23 19:10 fedml-dimitris