FedML icon indicating copy to clipboard operation
FedML copied to clipboard

About the args in FedML parrot examples.

Open JLU-Neal opened this issue 2 years ago • 9 comments

from the step3 in docs https://doc.fedml.ai/simulation/examples/sp_fedavg_mnist_lr_example.html, I am told to execute the following command to run the example code: python torch_fedavg_mnist_lr_one_line_example.py --cf fedml_config.yaml However, when I tried to modify the args in this YAML file (eg. set the using_gpu to true), I found that in the runtime the training is still based on the CPU. So I checked the code in fedml/lib/python3.7/site-packages/fedml/arguments.py, line 63, and I found a snippet of code as follows: `

    path_current_file = path.abspath(path.dirname(__file__))
    if training_type == "simulation" and comm_backend == "single_process":
        config_file = path.join(path_current_file, "config/simulation_sp/fedml_config.yaml")
        cmd_args.yaml_config_file = config_file
    elif training_type == "simulation" and comm_backend == "MPI":
        config_file = path.join(
            path_current_file, "config/simulaton_mpi/fedml_config.yaml"
        )
        cmd_args.yaml_config_file = config_file
    elif training_type == "cross_silo":
        pass
    elif training_type == "cross_device":
        pass
    else:
        pass

` It seems during the simulation, it does not matter how you set the YAML file, the default one would be loaded.

JLU-Neal avatar May 09 '22 14:05 JLU-Neal

Hi, if you noticed for the torch_fadavg_mnist_lr_one_line_example.py , it calls run_simulation(backend="single_process") in the __init__.py file of fedml (https://github.com/FedML-AI/FedML/blob/master/python/fedml/init.py) . The run_simulation(backend="single_process") sets

global _global_training_type
_global_training_type = "simulation"
global _global_comm_backend
_global_comm_backend = backend

If you comment out the _global_com_backend and _global_training type in run_simulation(backend="single_process") in __init__.py, it uses the default values of None, which will bypass the if/elif statements with hardcoded config file paths, and then will refer to your config file you specified.

global _global_training_type
#_global_training_type = "simulation"
global _global_comm_backend
#_global_comm_backend = backend

They probably have been using the default for the single_line.py files for debugging. It does not use this default for the step_by_step.py file. It uses the GPU for me when I run the step_by_step.py file OR comment out those lines in the __init__.py file to run the single_line.py example.

mlpotter avatar May 09 '22 16:05 mlpotter

Cool, thanks. Now I found another mistake from the doc: when using the parameters copied from the step 2 in [https://doc.fedml.ai/simulation/examples/sp_fedavg_mnist_lr_example.html](url), there are some other attribute errors like "'Argument' object has no attribute 'client_id_list'", "'Argument' object has no attribute 'log|_file_dir'", etc.

And after I tried to manually append these args, another exception is raised: "no such setting" in line 103 : `

elif args.training_type == "cross_device":
    args.rank = 0  # only server runs on Python package
else:
    raise Exception("no such setting")
return args

` It is weird that are we using the same version of the code?

JLU-Neal avatar May 09 '22 17:05 JLU-Neal

I am using fedml 0.7.24

My config file

common_args:
  training_type: "simulation"
  random_seed: 0

data_args:
  dataset: "mnist"
  data_cache_dir: "../../../data/mnist"
  partition_method: "hetero"
  partition_alpha: 0.5

model_args:
  model: "lr"

train_args:
  federated_optimizer: "FedAvg"
  client_id_list: "[]"
  client_num_in_total: 1000
  client_num_per_round: 10
  comm_round: 200
  epochs: 1
  batch_size: 10
  client_optimizer: sgd
  learning_rate: 0.03
  weight_decay: 0.001

validation_args:
  frequency_of_the_test: 5

device_args:
  using_gpu: true
  gpu_id: "0"

comm_args:
  backend: "single_process"
  is_mobile: 0

tracking_args:
  log_file_dir: ./log
  enable_wandb: false
  wandb_key: ee0b5f53d949c84cee7decbe7a629e63fb2f8408
  wandb_entity: fedml-ai
  wandb_project: simulation
  run_name: fedml_torch_fedavg_mnist_lr

for step_by_step I run python torch_fedavg_mnist_lr_step_by_step_example.py --cf=/home/mpotter/FedML/python/examples/simulation/sp_fedavg_mnist_lr_example/fedml_config.yaml

from folder ~/FedML/python/examples/simulation/sp_fedavg_mnist_lr_example.

To run the one_line.py with gpu I had to make those changes I mentioned before.

you can see it uses the gpu by looking at the device selected (I have some added random printouts because I played with source code, just ignore those) Screenshot from 2022-05-09 10-35-38

mlpotter avatar May 09 '22 17:05 mlpotter

@JLU-Neal @mlpotter I've optimized the source code and solved your issues. Please pip install fedml==0.7.27

chaoyanghe avatar May 10 '22 01:05 chaoyanghe

Cool, thanks. Now I found another mistake from the doc: when using the parameters copied from the step 2 in [https://doc.fedml.ai/simulation/examples/sp_fedavg_mnist_lr_example.html](url), there are some other attribute errors like "'Argument' object has no attribute 'client_id_list'", "'Argument' object has no attribute 'log|_file_dir'", etc.

And after I tried to manually append these args, another exception is raised: "no such setting" in line 103 : `

elif args.training_type == "cross_device":
    args.rank = 0  # only server runs on Python package
else:
    raise Exception("no such setting")
return args

` It is weird that are we using the same version of the code?

Hi I've updated the doc. the source code is correct. The issue comes from that the doc doesn't sync with the code.

chaoyanghe avatar May 10 '22 02:05 chaoyanghe

I am using fedml 0.7.24

My config file

common_args:
  training_type: "simulation"
  random_seed: 0

data_args:
  dataset: "mnist"
  data_cache_dir: "../../../data/mnist"
  partition_method: "hetero"
  partition_alpha: 0.5

model_args:
  model: "lr"

train_args:
  federated_optimizer: "FedAvg"
  client_id_list: "[]"
  client_num_in_total: 1000
  client_num_per_round: 10
  comm_round: 200
  epochs: 1
  batch_size: 10
  client_optimizer: sgd
  learning_rate: 0.03
  weight_decay: 0.001

validation_args:
  frequency_of_the_test: 5

device_args:
  using_gpu: true
  gpu_id: "0"

comm_args:
  backend: "single_process"
  is_mobile: 0

tracking_args:
  log_file_dir: ./log
  enable_wandb: false
  wandb_key: ee0b5f53d949c84cee7decbe7a629e63fb2f8408
  wandb_entity: fedml-ai
  wandb_project: simulation
  run_name: fedml_torch_fedavg_mnist_lr

for step_by_step I run python torch_fedavg_mnist_lr_step_by_step_example.py --cf=/home/mpotter/FedML/python/examples/simulation/sp_fedavg_mnist_lr_example/fedml_config.yaml

from folder ~/FedML/python/examples/simulation/sp_fedavg_mnist_lr_example.

To run the one_line.py with gpu I had to make those changes I mentioned before.

you can see it uses the gpu by looking at the device selected (I have some added random printouts because I played with source code, just ignore those) Screenshot from 2022-05-09 10-35-38

How can I change the data in simulation, for instance cifar10 instead of mnist? Thanks

trannict avatar May 10 '22 07:05 trannict

How can I change the data in simulation, for instance cifar10 instead of mnist? Thanks

https://github.com/FedML-AI/FedML-refactor/blob/master/python/fedml/data/data_loader.py

chaoyanghe avatar May 10 '22 08:05 chaoyanghe

How can I change the data in simulation, for instance cifar10 instead of mnist? Thanks

https://github.com/FedML-AI/FedML-refactor/blob/master/python/fedml/data/data_loader.py

Thanks a lot your comments

trannict avatar May 10 '22 08:05 trannict

I think the link was dow, please check it :)

trannict avatar May 10 '22 08:05 trannict

@trannict @JLU-Neal have you solved these issues?

chaoyanghe avatar Aug 19 '22 16:08 chaoyanghe

Hey, @chaoyanghe I am not working on this currently :D You can close the issue if you think it is solved!

JLU-Neal avatar Aug 19 '22 17:08 JLU-Neal

@JLU-Neal OK. The issue is super outdated and we iterated a lot in the past few months. Please switch to our new version later if you still need to use FedML.

chaoyanghe avatar Aug 19 '22 17:08 chaoyanghe