FedML
FedML copied to clipboard
About the args in FedML parrot examples.
from the step3 in docs https://doc.fedml.ai/simulation/examples/sp_fedavg_mnist_lr_example.html, I am told to execute the following command to run the example code:
python torch_fedavg_mnist_lr_one_line_example.py --cf fedml_config.yaml
However, when I tried to modify the args in this YAML file (eg. set the using_gpu to true), I found that in the runtime the training is still based on the CPU.
So I checked the code in fedml/lib/python3.7/site-packages/fedml/arguments.py, line 63, and I found a snippet of code as follows:
`
path_current_file = path.abspath(path.dirname(__file__))
if training_type == "simulation" and comm_backend == "single_process":
config_file = path.join(path_current_file, "config/simulation_sp/fedml_config.yaml")
cmd_args.yaml_config_file = config_file
elif training_type == "simulation" and comm_backend == "MPI":
config_file = path.join(
path_current_file, "config/simulaton_mpi/fedml_config.yaml"
)
cmd_args.yaml_config_file = config_file
elif training_type == "cross_silo":
pass
elif training_type == "cross_device":
pass
else:
pass
` It seems during the simulation, it does not matter how you set the YAML file, the default one would be loaded.
Hi, if you noticed for the torch_fadavg_mnist_lr_one_line_example.py
, it calls run_simulation(backend="single_process")
in the __init__.py
file of fedml (https://github.com/FedML-AI/FedML/blob/master/python/fedml/init.py) . The run_simulation(backend="single_process")
sets
global _global_training_type
_global_training_type = "simulation"
global _global_comm_backend
_global_comm_backend = backend
If you comment out the _global_com_backend and _global_training type in run_simulation(backend="single_process")
in __init__.py
, it uses the default values of None
, which will bypass the if/elif statements with hardcoded config file paths, and then will refer to your config file you specified.
global _global_training_type
#_global_training_type = "simulation"
global _global_comm_backend
#_global_comm_backend = backend
They probably have been using the default for the single_line.py files for debugging. It does not use this default for the step_by_step.py file. It uses the GPU for me when I run the step_by_step.py file OR comment out those lines in the __init__.py
file to run the single_line.py example.
Cool, thanks. Now I found another mistake from the doc: when using the parameters copied from the step 2 in [https://doc.fedml.ai/simulation/examples/sp_fedavg_mnist_lr_example.html](url), there are some other attribute errors like "'Argument' object has no attribute 'client_id_list'", "'Argument' object has no attribute 'log|_file_dir'", etc.
And after I tried to manually append these args, another exception is raised: "no such setting" in line 103 : `
elif args.training_type == "cross_device":
args.rank = 0 # only server runs on Python package
else:
raise Exception("no such setting")
return args
` It is weird that are we using the same version of the code?
I am using fedml 0.7.24
My config file
common_args:
training_type: "simulation"
random_seed: 0
data_args:
dataset: "mnist"
data_cache_dir: "../../../data/mnist"
partition_method: "hetero"
partition_alpha: 0.5
model_args:
model: "lr"
train_args:
federated_optimizer: "FedAvg"
client_id_list: "[]"
client_num_in_total: 1000
client_num_per_round: 10
comm_round: 200
epochs: 1
batch_size: 10
client_optimizer: sgd
learning_rate: 0.03
weight_decay: 0.001
validation_args:
frequency_of_the_test: 5
device_args:
using_gpu: true
gpu_id: "0"
comm_args:
backend: "single_process"
is_mobile: 0
tracking_args:
log_file_dir: ./log
enable_wandb: false
wandb_key: ee0b5f53d949c84cee7decbe7a629e63fb2f8408
wandb_entity: fedml-ai
wandb_project: simulation
run_name: fedml_torch_fedavg_mnist_lr
for step_by_step I run
python torch_fedavg_mnist_lr_step_by_step_example.py --cf=/home/mpotter/FedML/python/examples/simulation/sp_fedavg_mnist_lr_example/fedml_config.yaml
from folder ~/FedML/python/examples/simulation/sp_fedavg_mnist_lr_example
.
To run the one_line.py with gpu I had to make those changes I mentioned before.
you can see it uses the gpu by looking at the device selected (I have some added random printouts because I played with source code, just ignore those)
@JLU-Neal @mlpotter I've optimized the source code and solved your issues. Please pip install fedml==0.7.27
Cool, thanks. Now I found another mistake from the doc: when using the parameters copied from the step 2 in [https://doc.fedml.ai/simulation/examples/sp_fedavg_mnist_lr_example.html](url), there are some other attribute errors like "'Argument' object has no attribute 'client_id_list'", "'Argument' object has no attribute 'log|_file_dir'", etc.
And after I tried to manually append these args, another exception is raised: "no such setting" in line 103 : `
elif args.training_type == "cross_device": args.rank = 0 # only server runs on Python package else: raise Exception("no such setting") return args
` It is weird that are we using the same version of the code?
Hi I've updated the doc. the source code is correct. The issue comes from that the doc doesn't sync with the code.
I am using fedml 0.7.24
My config file
common_args: training_type: "simulation" random_seed: 0 data_args: dataset: "mnist" data_cache_dir: "../../../data/mnist" partition_method: "hetero" partition_alpha: 0.5 model_args: model: "lr" train_args: federated_optimizer: "FedAvg" client_id_list: "[]" client_num_in_total: 1000 client_num_per_round: 10 comm_round: 200 epochs: 1 batch_size: 10 client_optimizer: sgd learning_rate: 0.03 weight_decay: 0.001 validation_args: frequency_of_the_test: 5 device_args: using_gpu: true gpu_id: "0" comm_args: backend: "single_process" is_mobile: 0 tracking_args: log_file_dir: ./log enable_wandb: false wandb_key: ee0b5f53d949c84cee7decbe7a629e63fb2f8408 wandb_entity: fedml-ai wandb_project: simulation run_name: fedml_torch_fedavg_mnist_lr
for step_by_step I run
python torch_fedavg_mnist_lr_step_by_step_example.py --cf=/home/mpotter/FedML/python/examples/simulation/sp_fedavg_mnist_lr_example/fedml_config.yaml
from folder
~/FedML/python/examples/simulation/sp_fedavg_mnist_lr_example
.To run the one_line.py with gpu I had to make those changes I mentioned before.
you can see it uses the gpu by looking at the device selected (I have some added random printouts because I played with source code, just ignore those)
How can I change the data in simulation, for instance cifar10 instead of mnist? Thanks
How can I change the data in simulation, for instance cifar10 instead of mnist? Thanks
https://github.com/FedML-AI/FedML-refactor/blob/master/python/fedml/data/data_loader.py
How can I change the data in simulation, for instance cifar10 instead of mnist? Thanks
https://github.com/FedML-AI/FedML-refactor/blob/master/python/fedml/data/data_loader.py
Thanks a lot your comments
I think the link was dow, please check it :)
@trannict @JLU-Neal have you solved these issues?
Hey, @chaoyanghe I am not working on this currently :D You can close the issue if you think it is solved!
@JLU-Neal OK. The issue is super outdated and we iterated a lot in the past few months. Please switch to our new version later if you still need to use FedML.