FedML
FedML copied to clipboard
[customer requirement] test the comparability issues on Windows
error with test/fedml_user_code/cross_silo
example
OS: Windows 10;Version: 21H2 (internal version 19044.1645) Python Version: 3.7 Package Version: fedml 0.7.12
When running the run_server.sh and run_client.sh scripts separately according to test/fedml_user_code/cross_silo/README.md
, the program fails to run properly.
The output of the server:
ThinkPad@LAPTOP-M816KBBA MINGW64 /g/python projects/FedML/test/fedml_user_code/cross_silo (master)
$ bash run_server.sh
......
[mqtt_s3_multi_clients_comm_manager.py:187:_on_message_impl] --------------------------
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:08] [INFO] [mqtt_s3_multi_clients_comm_manager.py:218:_on_message_impl] mqtt_s3.on_message: not use s3 pack
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:08] [INFO] [mqtt_s3_multi_clients_comm_manager.py:182:_notify] mqtt_s3.notify: msg type = 5
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:08] [INFO] [server_manager.py:108:receive_message] receive_message. rank_id = 0, msg_type = 5.
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:08] [INFO] [fedml_server_manager.py:113:handle_message_client_status_update] sender_id = 2, all_client_is_online = True
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:08] [INFO] [fedml_aggregator.py:119:data_silo_selection] client_num_in_total = 1000, client_num_per_round = 2
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:08] [INFO] [mqtt_s3_multi_clients_comm_manager.py:240:send_message] mqtt_s3.send_message: starting...
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:08] [INFO] [mqtt_s3_multi_clients_comm_manager.py:246:send_message] mqtt_s3.send_message: msg topic = fedml_0_0_1
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:08] [INFO] [mqtt_s3_multi_clients_comm_manager.py:255:send_message] mqtt_s3.send_message: S3+MQTT msg sent, s3 message key = f
edml_0_0_1_32f954bc-a668-4fa7-89dd-1b52d6dc8207
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:08] [INFO] [mqtt_s3_multi_clients_comm_manager.py:265:send_message] mqtt_s3.send_message: to python client.
Traceback (most recent call last):
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 224, in _on_message
self._on_message_impl(client, userdata, msg)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 220, in _on_message_impl
self._notify(payload_obj)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 184, in _notify
observer.receive_message(msg_type, msg_params)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\server\server_manager.py", line 111, in receive_message
handler_callback_func(msg_params)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\cross_silo\horizontal\fedml_server_manager.py", line 118, in handle_message_client_status_update
self.send_init_msg()
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\cross_silo\horizontal\fedml_server_manager.py", line 68, in send_init_msg
data_silo_index_list[client_idx_in_this_round],
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\cross_silo\horizontal\fedml_server_manager.py", line 205, in send_message_init_config
self.send_message(message)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\server\server_manager.py", line 114, in send_message
self.com_manager.send_message(message)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 267, in send_message
message_key, model_params_obj
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\remote_storage.py", line 48, in write_model
ACL="public-read",
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\client.py", line 415, in _api_call
return self._make_api_call(operation_name, kwargs)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\client.py", line 732, in _make_api_call
operation_model, request_dict, request_context)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\client.py", line 751, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\endpoint.py", line 107, in make_request
return self._send_request(request_dict, operation_model)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\endpoint.py", line 180, in _send_request
request = self.create_request(request_dict, operation_model)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\endpoint.py", line 121, in create_request
operation_name=operation_model.name)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\hooks.py", line 358, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\hooks.py", line 229, in emit
return self._emit(event_name, kwargs)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\hooks.py", line 212, in _emit
response = handler(**kwargs)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\signers.py", line 95, in handler
return self.sign(operation_name, request)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\signers.py", line 167, in sign
auth.add_auth(request)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\botocore\auth.py", line 401, in add_auth
raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:09] [INFO] [mqtt_s3_multi_clients_comm_manager.py:159:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0,
user data None
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:06:09] [INFO] [mqtt_s3_status_manager.py:80:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0, user data None
The output of the client1:
ThinkPad@LAPTOP-M816KBBA MINGW64 /g/python projects/FedML/test/fedml_user_code/cross_silo (master)
$ bash run_client.sh 1
......
[FedML-Client(1) @device-id-1] [Mon, 02 May 2022 15:00:00] [INFO] [client_manager.py:115:send_message] Sending message (type 5) to server
[FedML-Client(1) @device-id-1] [Mon, 02 May 2022 15:00:00] [INFO] [mqtt_s3_multi_clients_comm_manager.py:240:send_message] mqtt_s3.send_message: starting...
[FedML-Client(1) @device-id-1] [Mon, 02 May 2022 15:00:00] [INFO] [mqtt_s3_multi_clients_comm_manager.py:322:send_message] mqtt_s3.send_message: MQTT msg sent
Traceback (most recent call last):
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 150, in _on_connect
self._on_connect_impl(client, userdata, flags, rc)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 141, in _on_connect_impl
self._notify_connection_ready()
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 176, in _notify_connecti
on_ready
observer.receive_message(msg_type, msg_params)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\client\client_manager.py", line 103, in receive_message
handler_callback_func(msg_params)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\cross_silo\horizontal\fedml_client_manager.py", line 62, in handle_message_connection_ready
self.sys_stats_process.start()
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'botocore.client.S3'>: attribute lookup S3 on botocore.client failed
[FedML-Client(1) @device-id-1] [Mon, 02 May 2022 15:00:01] [INFO] [mqtt_s3_multi_clients_comm_manager.py:159:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0,
user data None
[FedML-Client(1) @device-id-1] [Mon, 02 May 2022 15:00:01] [INFO] [mqtt_s3_status_manager.py:80:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0, user data None
ThinkPad@LAPTOP-M816KBBA MINGW64 /g/python projects/FedML/test/fedml_user_code/cross_silo (master)
$ Traceback (most recent call last):
File "<string>", line 1, in <module>
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
The output of the client2:
ThinkPad@LAPTOP-M816KBBA MINGW64 /g/python projects/FedML/test/fedml_user_code/cross_silo (master)
$ bash run_client.sh 2
......
[FedML-Client(2) @device-id-1] [Mon, 02 May 2022 14:56:59] [INFO] [device.py:35:get_device] device = cpu
[FedML-Client(2) @device-id-1] [Mon, 02 May 2022 14:56:59] [INFO] [data_loader.py:22:download_mnist] ../../../../data/mnist/MNIST.zip
[FedML-Client(2) @device-id-1] [Mon, 02 May 2022 14:57:16] [INFO] [data_loader.py:57:load_synthetic_data] load_data. dataset_name = mnist
Traceback (most recent call last):
File "client/torch_client.py", line 5, in <module>
fedml.run_cross_silo_client()
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\__init__.py", line 137, in run_cross_silo_client
dataset, output_dim = fedml.data.load(args)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\data\data_loader.py", line 30, in load
return load_synthetic_data(args)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\data\data_loader.py", line 71, in load_synthetic_data
test_path=args.data_cache_dir + "/MNIST/test",
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\data\MNIST\data_loader.py", line 114, in load_partition_data_mnist
users, groups, train_data, test_data = read_data(train_path, test_path)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\data\MNIST\data_loader.py", line 56, in read_data
cdata = json.load(inf)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\json\__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\json\__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
MemoryError
@chaoyanghe
@alex-liang-kh @Nicole456 Hi Alex, we need to wrapper S3 service as HTTPS API for our users.
problem with test/fedml_user_code/simulation_mpi
example
When running python main.py
under test/fedml_user_code/cross_silo
, the program gets stuck here for a pretty long time ,without any other output:
FedML/test/fedml_user_code/simulation_mpi (master)
$ python main.py
......
################## You do not indicate gpu_util_file, will use CPU training #################
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:21:05] [INFO] [gpu_mapping.py:17:mapping_processes_to_gpu_device_from_yaml_file] cpu
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:21:05] [INFO] [data_loader.py:22:download_mnist] ./data/mnist/MNIST.zip
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:21:11] [INFO] [data_loader.py:57:load_synthetic_data] load_data. dataset_name = mnist
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:21:53] [INFO] [data_loader.py:126:load_partition_data_mnist] loading data...
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:21:57] [INFO] [data_loader.py:144:load_partition_data_mnist] finished the loading data
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:21:58] [INFO] [model_hub.py:16:create] create_model. model_name = lr, output_dim = 10
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:21:58] [INFO] [model_hub.py:19:create] LogisticRegression + MNIST
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 15:21:58] [INFO] [FedAVGAggregator.py:112:client_sampling] client_indexes = [993 859 298 553]
@chaoyanghe
error with test/fedml_user_code/cross_device
When I try to run the server with the following command:
4. start the python server at
python/examples/cross_device/mqtt_s3_fedavg_mnist_lr_example/custum_data_and_model/
bash run_server.sh
, the server-side reports the following error output:
/FedML/test/fedml_user_code/cross_device (master)
$ bash run_server.sh
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 16:30:45] [INFO] [__init__.py:30:init] args = {'yaml_config_file': './config/fedml_config.yaml', 'run_id': '189', 'rank': 0, 'yaml_paths': ['D:\\ProgramData\\Miniconda3\\envs\\FedML0502\
\lib\\site-packages\\fedml\\config/simulation_sp/fedml_config.yaml'], 'training_type': 'simulation', 'using_mlops': False, 'random_seed': 0, 'dataset': 'mnist', 'data_cache_dir': './data/mnist', 'partition_method': 'hetero', 'partition
_alpha': 0.5, 'model': 'lr', 'federated_optimizer': 'FedAvg', 'client_id_list': '[]', 'client_num_in_total': 1000, 'client_num_per_round': 10, 'comm_round': 200, 'epochs': 1, 'batch_size': 10, 'client_optimizer': 'sgd', 'learning_rate'
: 0.03, 'weight_decay': 0.001, 'frequency_of_the_test': 5, 'using_gpu': False, 'gpu_id': 0, 'backend': 'single_process', 'log_file_dir': './log', 'enable_wandb': False}
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 16:30:45] [INFO] [device.py:14:get_device] device = cpu
Traceback (most recent call last):
File "torch_server.py", line 32, in <module>
create_mnn_lenet5_model(args.global_model_file_path)
AttributeError: 'Arguments' object has no attribute 'global_model_file_path'
@chaoyanghe
@Nicole456 the latest two issues you reported are fixed. Please check out fedml==0.7.13
problem with test/fedml_user_code/cross_device
example
The device is always in the initialized state:
The server is stuck here and can't continue to run, there is an error message in the output:
[Errno 2] No such file or directory: './model_file_cache/global_model.mnn'
register_message_receive_handlers------
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:21] [INFO] [mqtt_s3_comm_manager.py:105:_on_connect_impl] mqtt_s3.on_connect: connection returned with result code:0
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:21] [INFO] [mqtt_s3_comm_manager.py:117:_on_connect_impl] mqtt_s3.on_connect: server subscribes real_topic = fedml_189_146, mid = 1, result = 0
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:21] [INFO] [mqtt_s3_comm_manager.py:145:_on_subscribe] mqtt_s3.onSubscribe: mid = 1
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mqtt_s3_comm_manager.py:163:_on_message_impl] --------------------------
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mqtt_s3_comm_manager.py:166:_on_message_impl] mqtt_s3.on_message: payload_obj {'client_os': 'Android', 'client_status': 'ONLINE', 'msg_type': 5, 'receiver': 0, 'sender'
: 146}
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mqtt_s3_comm_manager.py:183:_on_message_impl] mqtt_s3.on_message: not use s3 pack
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mqtt_s3_comm_manager.py:158:_notify] mqtt_s3.notify: msg type = 5
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [server_manager.py:108:receive_message] receive_message. rank_id = 0, msg_type = 5.
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mlops_profiler_event.py:54:log_event_started] Event started, {"run_id": "189", "edge_id": 0, "event_name": "aggregator.wait-online", "event_value": "", "started_time":
1651495342}
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [fedml_server_manager.py:180:handle_message_client_status_update] sender_id = 146, all_client_is_online = True
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [fedml_aggregator.py:112:data_silo_selection] data_silo_num_in_total = 1, client_num_in_total = 1
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [fedml_server_manager.py:139:send_init_msg] client_id_list_in_this_round = [146], data_silo_index_list = [0]
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [fedml_server_manager.py:263:send_message_init_config] global_model_params = ./model_file_cache/global_model.mnn
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mqtt_s3_comm_manager.py:205:send_message] mqtt_s3.send_message: starting...{'msg_type': 1, 'sender': 0, 'receiver': 146, 'model_params': './model_file_cache/global_mode
l.mnn', 'client_idx': '0'}
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mqtt_s3_comm_manager.py:212:send_message] mqtt_s3.send_message: msg topic = fedml_189_0_146
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mqtt_s3_comm_manager.py:221:send_message] mqtt_s3.send_message: S3+MQTT msg sent, s3 message key = fedml_189_0_146_161e9da4-2ddf-4d28-a54d-f381226fab59
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [ERROR] [remote_storage.py:105:upload_file] Upload data failed. | src: ./model_file_cache/global_model.mnn | dest: fedml_189_0_146_161e9da4-2ddf-4d28-a54d-f381226fab59 | Except
ion: [Errno 2] No such file or directory: './model_file_cache/global_model.mnn'
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mlops_profiler_event.py:54:log_event_started] Event started, {"run_id": "189", "edge_id": 0, "event_name": "server.wait", "event_value": "", "started_time": 1651495342}
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:42:22] [INFO] [mlops_profiler_event.py:76:log_event_ended] Event ended, {"run_id": "189", "edge_id": 0, "event_name": "aggregator.wait-online", "event_value": "", "ended_time": 165149
5342}
@chaoyanghe
problem with test/fedml_user_code/simulation_mpi
sh run_one_line_example.sh 4
Since mpirun in windows corresponds to mpiexec, I modified run_one_line_example.sh
as follows:
#!/usr/bin/env bash
hostname > mpi_host_file
mpiexec -np 5 \
#-hostfile mpi_host_file \
python main.py --cf fedml_config.yaml
The program is stuck in the following state:
ThinkPad@LAPTOP-M816KBBA MINGW64 /g/python projects/FedML/test/fedml_user_code/simulation_mpi (master)
$ sh run_one_line_example.sh 4
Error: no executable specified.
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [__init__.py:30:init] args = {'yaml_config_file': 'fedml_config.yaml', 'run_id': '0', 'rank': 0, 'yaml_paths': ['D:\\ProgramData\\Miniconda3\\envs\\FedML0502\\lib\\site-
packages\\fedml\\config/simulaton_mpi/fedml_config.yaml'], 'training_type': 'simulation', 'using_mlops': False, 'random_seed': 0, 'dataset': 'mnist', 'data_cache_dir': './data/mnist', 'partition_method': 'hetero', 'partition_alpha': 0.
5, 'model': 'lr', 'federated_optimizer': 'FedAvg', 'client_id_list': '[]', 'client_num_in_total': 1000, 'client_num_per_round': 4, 'comm_round': 50, 'epochs': 1, 'batch_size': 10, 'client_optimizer': 'sgd', 'learning_rate': 0.03, 'weig
ht_decay': 0.001, 'frequency_of_the_test': 5, 'worker_num': 4, 'using_gpu': False, 'gpu_mapping_file': 'D:\\ProgramData\\Miniconda3\\envs\\FedML0502\\lib\\site-packages\\fedml\\config/simulaton_mpi/gpu_mapping.yaml', 'gpu_mapping_key':
'mapping_default', 'backend': 'MPI', 'is_mobile': 0, 'log_file_dir': './log', 'enable_wandb': False, 'wandb_key': 'ee0b5f53d949c84cee7decbe7a629e63fb2f8408', 'wandb_project': 'fedml', 'wandb_name': 'fedml_torch_fedavg_mnist_lr'}
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [gpu_mapping.py:13:mapping_processes_to_gpu_device_from_yaml_file] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [gpu_mapping.py:15:mapping_processes_to_gpu_device_from_yaml_file] ################## You do not indicate gpu_util_file, will use CPU training #################
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [gpu_mapping.py:17:mapping_processes_to_gpu_device_from_yaml_file] cpu
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [data_loader.py:22:download_mnist] ./data/mnist/MNIST.zip
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:59] [INFO] [data_loader.py:57:load_synthetic_data] load_data. dataset_name = mnist
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:01] [INFO] [data_loader.py:126:load_partition_data_mnist] loading data...
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:06] [INFO] [data_loader.py:144:load_partition_data_mnist] finished the loading data
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:07] [INFO] [model_hub.py:16:create] create_model. model_name = lr, output_dim = 10
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:07] [INFO] [model_hub.py:19:create] LogisticRegression + MNIST
[FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:07] [INFO] [FedAVGAggregator.py:112:client_sampling] client_indexes = [993 859 298 553]
problem with
test/fedml_user_code/simulation_mpi
sh run_one_line_example.sh 4
Since mpirun in windows corresponds to mpiexec, I modified
run_one_line_example.sh
as follows:#!/usr/bin/env bash hostname > mpi_host_file mpiexec -np 5 \ #-hostfile mpi_host_file \ python main.py --cf fedml_config.yaml
The program is stuck in the following state:
ThinkPad@LAPTOP-M816KBBA MINGW64 /g/python projects/FedML/test/fedml_user_code/simulation_mpi (master) $ sh run_one_line_example.sh 4 Error: no executable specified. [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [__init__.py:30:init] args = {'yaml_config_file': 'fedml_config.yaml', 'run_id': '0', 'rank': 0, 'yaml_paths': ['D:\\ProgramData\\Miniconda3\\envs\\FedML0502\\lib\\site- packages\\fedml\\config/simulaton_mpi/fedml_config.yaml'], 'training_type': 'simulation', 'using_mlops': False, 'random_seed': 0, 'dataset': 'mnist', 'data_cache_dir': './data/mnist', 'partition_method': 'hetero', 'partition_alpha': 0. 5, 'model': 'lr', 'federated_optimizer': 'FedAvg', 'client_id_list': '[]', 'client_num_in_total': 1000, 'client_num_per_round': 4, 'comm_round': 50, 'epochs': 1, 'batch_size': 10, 'client_optimizer': 'sgd', 'learning_rate': 0.03, 'weig ht_decay': 0.001, 'frequency_of_the_test': 5, 'worker_num': 4, 'using_gpu': False, 'gpu_mapping_file': 'D:\\ProgramData\\Miniconda3\\envs\\FedML0502\\lib\\site-packages\\fedml\\config/simulaton_mpi/gpu_mapping.yaml', 'gpu_mapping_key': 'mapping_default', 'backend': 'MPI', 'is_mobile': 0, 'log_file_dir': './log', 'enable_wandb': False, 'wandb_key': 'ee0b5f53d949c84cee7decbe7a629e63fb2f8408', 'wandb_project': 'fedml', 'wandb_name': 'fedml_torch_fedavg_mnist_lr'} [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [gpu_mapping.py:13:mapping_processes_to_gpu_device_from_yaml_file] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [gpu_mapping.py:15:mapping_processes_to_gpu_device_from_yaml_file] ################## You do not indicate gpu_util_file, will use CPU training ################# [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [gpu_mapping.py:17:mapping_processes_to_gpu_device_from_yaml_file] cpu [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [data_loader.py:22:download_mnist] ./data/mnist/MNIST.zip [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:59] [INFO] [data_loader.py:57:load_synthetic_data] load_data. dataset_name = mnist [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:01] [INFO] [data_loader.py:126:load_partition_data_mnist] loading data... [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:06] [INFO] [data_loader.py:144:load_partition_data_mnist] finished the loading data [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:07] [INFO] [model_hub.py:16:create] create_model. model_name = lr, output_dim = 10 [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:07] [INFO] [model_hub.py:19:create] LogisticRegression + MNIST [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:07] [INFO] [FedAVGAggregator.py:112:client_sampling] client_indexes = [993 859 298 553]
for this one, please update your souce code. It's fixed.
'./model_file_cache/global_model.mnn'
for this one, please report your step-by-step operation
problem with
test/fedml_user_code/simulation_mpi
sh run_one_line_example.sh 4
Since mpirun in windows corresponds to mpiexec, I modified
run_one_line_example.sh
as follows:#!/usr/bin/env bash hostname > mpi_host_file mpiexec -np 5 \ #-hostfile mpi_host_file \ python main.py --cf fedml_config.yaml
The program is stuck in the following state:
ThinkPad@LAPTOP-M816KBBA MINGW64 /g/python projects/FedML/test/fedml_user_code/simulation_mpi (master) $ sh run_one_line_example.sh 4 Error: no executable specified. [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [__init__.py:30:init] args = {'yaml_config_file': 'fedml_config.yaml', 'run_id': '0', 'rank': 0, 'yaml_paths': ['D:\\ProgramData\\Miniconda3\\envs\\FedML0502\\lib\\site- packages\\fedml\\config/simulaton_mpi/fedml_config.yaml'], 'training_type': 'simulation', 'using_mlops': False, 'random_seed': 0, 'dataset': 'mnist', 'data_cache_dir': './data/mnist', 'partition_method': 'hetero', 'partition_alpha': 0. 5, 'model': 'lr', 'federated_optimizer': 'FedAvg', 'client_id_list': '[]', 'client_num_in_total': 1000, 'client_num_per_round': 4, 'comm_round': 50, 'epochs': 1, 'batch_size': 10, 'client_optimizer': 'sgd', 'learning_rate': 0.03, 'weig ht_decay': 0.001, 'frequency_of_the_test': 5, 'worker_num': 4, 'using_gpu': False, 'gpu_mapping_file': 'D:\\ProgramData\\Miniconda3\\envs\\FedML0502\\lib\\site-packages\\fedml\\config/simulaton_mpi/gpu_mapping.yaml', 'gpu_mapping_key': 'mapping_default', 'backend': 'MPI', 'is_mobile': 0, 'log_file_dir': './log', 'enable_wandb': False, 'wandb_key': 'ee0b5f53d949c84cee7decbe7a629e63fb2f8408', 'wandb_project': 'fedml', 'wandb_name': 'fedml_torch_fedavg_mnist_lr'} [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [gpu_mapping.py:13:mapping_processes_to_gpu_device_from_yaml_file] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [gpu_mapping.py:15:mapping_processes_to_gpu_device_from_yaml_file] ################## You do not indicate gpu_util_file, will use CPU training ################# [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [gpu_mapping.py:17:mapping_processes_to_gpu_device_from_yaml_file] cpu [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:53] [INFO] [data_loader.py:22:download_mnist] ./data/mnist/MNIST.zip [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:02:59] [INFO] [data_loader.py:57:load_synthetic_data] load_data. dataset_name = mnist [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:01] [INFO] [data_loader.py:126:load_partition_data_mnist] loading data... [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:06] [INFO] [data_loader.py:144:load_partition_data_mnist] finished the loading data [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:07] [INFO] [model_hub.py:16:create] create_model. model_name = lr, output_dim = 10 [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:07] [INFO] [model_hub.py:19:create] LogisticRegression + MNIST [FedML-Server(0) @device-id-0] [Mon, 02 May 2022 20:04:07] [INFO] [FedAVGAggregator.py:112:client_sampling] client_indexes = [993 859 298 553]
for this one, please update your souce code. It's fixed.
I successfully ran this example by running the following command:
mpiexec -np 2 python main.py --cf fedml_config.yaml
On my Windows10 PC if set to -np 5
, it causes MemoryError, and I can't run the case via sh run_one_line_example.sh
4
P.S. I modified run_one_line_example.sh as follows
#!/usr/bin/env bash
hostname > mpi_host_file
mpiexec -np 2 \
#--hostfile mpi_host_file \
python main.py --cf fedml_config.yaml
That is, after I modified run_one_line_example.sh
, I ran sh run_one_line_example.sh
and still got the results reported before
@Nicole456 please summarize the remaining issues you have on Windows for all the test cases...It's a long message, I may miss some errors you reported.
The remaining issues on Windows
test/fedml_user_code/simulation_mpi
I successfully ran this example by running the following command:
mpiexec -np 2 python main.py --cf fedml_config.yaml
On my PC if set to -np 5
, it causes MemoryError, and I can't run the case via sh run_one_line_example.sh
4
P.S. I modified run_one_line_example.sh as follows
#!/usr/bin/env bash
hostname > mpi_host_file
mpiexec -np 2 \
#--hostfile mpi_host_file \
python main.py --cf fedml_config.yaml
That is, after I modified run_one_line_example.sh
, I ran sh run_one_line_example.sh
and still got the results reported before,that is, the program is stuck in the place shown below:
/python projects/FedML/test/fedml_user_code/simulation_mpi (master)
$ bash run_one_line_example.sh
Error: no executable specified.
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:19:46] [INFO] [__init__.py:30:init] args = {'yaml_config_file': 'fedml_config.yaml', 'run_id': '0', 'rank': 0, 'yaml_paths': ['D:\\ProgramData\\Miniconda3\\envs\\FedML0502\\lib\\site-
packages\\fedml\\config/simulaton_mpi/fedml_config.yaml'], 'training_type': 'simulation', 'using_mlops': False, 'random_seed': 0, 'dataset': 'mnist', 'data_cache_dir': './data/mnist', 'partition_method': 'hetero', 'partition_alpha': 0.
5, 'model': 'lr', 'federated_optimizer': 'FedAvg', 'client_id_list': '[]', 'client_num_in_total': 1000, 'client_num_per_round': 4, 'comm_round': 50, 'epochs': 1, 'batch_size': 10, 'client_optimizer': 'sgd', 'learning_rate': 0.03, 'weig
ht_decay': 0.001, 'frequency_of_the_test': 5, 'worker_num': 4, 'using_gpu': False, 'gpu_mapping_file': 'D:\\ProgramData\\Miniconda3\\envs\\FedML0502\\lib\\site-packages\\fedml\\config/simulaton_mpi/gpu_mapping.yaml', 'gpu_mapping_key':
'mapping_default', 'backend': 'MPI', 'is_mobile': 0, 'log_file_dir': './log', 'enable_wandb': False, 'wandb_key': 'ee0b5f53d949c84cee7decbe7a629e63fb2f8408', 'wandb_project': 'fedml', 'wandb_name': 'fedml_torch_fedavg_mnist_lr'}
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:19:46] [INFO] [gpu_mapping.py:13:mapping_processes_to_gpu_device_from_yaml_file] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:19:46] [INFO] [gpu_mapping.py:15:mapping_processes_to_gpu_device_from_yaml_file] ################## You do not indicate gpu_util_file, will use CPU training #################
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:19:46] [INFO] [gpu_mapping.py:17:mapping_processes_to_gpu_device_from_yaml_file] cpu
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:19:46] [INFO] [data_loader.py:22:download_mnist] ./data/mnist/MNIST.zip
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:19:52] [INFO] [data_loader.py:57:load_synthetic_data] load_data. dataset_name = mnist
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:20:33] [INFO] [data_loader.py:126:load_partition_data_mnist] loading data...
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:20:38] [INFO] [data_loader.py:144:load_partition_data_mnist] finished the loading data
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:20:39] [INFO] [model_hub.py:16:create] create_model. model_name = lr, output_dim = 10
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:20:39] [INFO] [model_hub.py:19:create] LogisticRegression + MNIST
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:20:39] [INFO] [FedAVGAggregator.py:112:client_sampling] client_indexes = [993 859 298 553]
test/fedml_user_code/cross_device
-
adb push the data to your Android device
This part is done by manually downloading the dataset and then transferring the data to the phone via ADB
-
Launch Android Device, and bind the Android Device to open.fedml.ai.
-
check the device ID at open.fedml.ai, and change the edge ID at the test scripts
-
start the python server at
python/examples/cross_device/mqtt_s3_fedavg_mnist_lr_example/custum_data_and_model/
bash run_server.sh
The exception is in this step, when I finish the previous steps, run python torch_server.py --cf . /config/fedml_config.yaml --rank 0 --run_id 189
,The error is as follows:
ThinkPad@LAPTOP-M816KBBA MINGW64 /g/python projects/FedML/test/fedml_user_code/cross_device (master)
$ python torch_server.py --cf ./config/fedml_config.yaml --rank 0 --run_id 189
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:00:42] [INFO] [__init__.py:30:init] args = {'yaml_config_file': './config/fedml_config.yaml', 'run_id': '189', 'rank': 0, 'yaml_paths': ['./config/fedml_config.yaml'], 'training_type'
: 'cross_device', 'using_mlops': False, 'random_seed': 0, 'dataset': 'mnist', 'data_cache_dir': '../../../data/mnist', 'partition_method': 'hetero', 'partition_alpha': 0.5, 'model': 'lr', 'model_file_cache_folder': './model_file_cache'
, 'global_model_file_path': './model_file_cache/global_model.mnn', 'federated_optimizer': 'FedAvg', 'client_id_list': '[150]', 'client_num_in_total': 1, 'client_num_per_round': 1, 'comm_round': 3, 'epochs': 1, 'batch_size': 100, 'batch
_num': -1, 'client_optimizer': 'sgd', 'learning_rate': 0.03, 'weight_decay': 0.001, 'frequency_of_the_test': 5, 'worker_num': 1, 'using_gpu': False, 'gpu_mapping_file': 'config/gpu_mapping.yaml', 'gpu_mapping_key': 'mapping_default', '
backend': 'MQTT_S3_MNN', 'mqtt_config_path': 'config/mqtt_config.yaml', 's3_config_path': 'config/s3_config.yaml', 'log_file_dir': './log', 'enable_wandb': False, 'wandb_obj': '', 'wandb_key': 'ee0b5f53d949c84cee7decbe7a629e63fb2f8408'
, 'wandb_project': 'fedml', 'wandb_name': 'fedml_torch_fedavg_mnist_lr'}
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:00:42] [INFO] [device.py:43:get_device] device = cpu
Traceback (most recent call last):
File "torch_server.py", line 32, in <module>
create_mnn_lenet5_model(args.global_model_file_path)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\model\mobile\mnn_lenet.py", line 35, in create_mnn_lenet5_model
F.save([predicts], mnn_file_path)
RuntimeError: Caught an unknown exception!
By debugging I found the specific line of code where the error occurred is the last line in the following code:
\lib\site-packages\fedml\model\mobile\mnn_lenet.py
def create_mnn_lenet5_model(mnn_file_path):
net = Lenet5()
input_var = MNN.expr.placeholder([1, 1, 28, 28], MNN.expr.NCHW)
predicts = net.forward(input_var)
F.save([predicts], mnn_file_path)
Maybe there is some problem with MNN library on windows
test/fedml_user_code/cross_silo
Follow the steps in the cross-silo README to run the server and both clients and get the following output:
server:
ThinkPad@LAPTOP-M816KBBA MINGW64 /g/python projects/FedML/test/fedml_user_code/cross_silo (master)
$ bash run_server.sh
......
register_message_receive_handlers------
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:13:34] [INFO] [mqtt_s3_multi_clients_comm_manager.py:117:_on_connect_impl] mqtt_s3.on_connect: connection returned with result
code:0
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:13:34] [INFO] [mqtt_s3_multi_clients_comm_manager.py:130:_on_connect_impl] mqtt_s3.on_connect: subscribes real_topic = fedml_0_
1, mid = 1, result = 0
Traceback (most recent call last):
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 150, in _on_connect
self._on_connect_impl(client, userdata, flags, rc)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 123, in _on_connect_im
pl
real_topic = self._topic + str(self.client_real_ids[client_rank])
IndexError: list index out of range
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:13:35] [INFO] [mqtt_s3_multi_clients_comm_manager.py:159:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0
, user data None
[FedML-Server(0) @device-id-0] [Tue, 03 May 2022 16:13:35] [INFO] [mqtt_s3_status_manager.py:80:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0, user data None
client1
.....
[FedML-Client(1) @device-id-1] [Tue, 03 May 2022 16:14:08] [INFO] [mqtt_s3_multi_clients_comm_manager.py:322:send_message] mqtt_s3.send_message: MQTT msg sent
Traceback (most recent call last):
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 150, in _on_connect
self._on_connect_impl(client, userdata, flags, rc)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 141, in _on_connect_im
pl
self._notify_connection_ready()
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\communication\mqtt_s3\mqtt_s3_multi_clients_comm_manager.py", line 176, in _notify_connec
tion_ready
observer.receive_message(msg_type, msg_params)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\core\distributed\client\client_manager.py", line 103, in receive_message
handler_callback_func(msg_params)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\cross_silo\horizontal\fedml_client_manager.py", line 62, in handle_message_connection_ready
self.sys_stats_process.start()
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'botocore.client.S3'>: attribute lookup S3 on botocore.client failed
[FedML-Client(1) @device-id-1] [Tue, 03 May 2022 16:14:08] [INFO] [mqtt_s3_multi_clients_comm_manager.py:159:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0
, user data None
[FedML-Client(1) @device-id-1] [Tue, 03 May 2022 16:14:08] [INFO] [mqtt_s3_status_manager.py:80:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0, user data None
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
client2
Traceback (most recent call last):
File "client/torch_client.py", line 5, in <module>
fedml.run_cross_silo_client()
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\__init__.py", line 143, in run_cross_silo_client
client = ClientCrossSilo(args, device, dataset, model)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\cross_silo\client.py", line 16, in __init__
preprocessed_sampling_lists=None,
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\cross_silo\horizontal\fedml_horizontal_api.py", line 60, in FedML_Horizontal
model_trainer,
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\cross_silo\horizontal\fedml_horizontal_api.py", line 155, in init_client
client_manager = FedMLClientManager(args, trainer, comm, client_rank, client_num, backend)
File "D:\ProgramData\Miniconda3\envs\FedML0502\lib\site-packages\fedml\cross_silo\horizontal\fedml_client_manager.py", line 26, in __init__
self.client_real_id = self.client_real_ids[self.get_sender_id() - 1]
IndexError: list index out of range
[FedML-Client(2) @device-id-1] [Tue, 03 May 2022 16:14:10] [INFO] [mqtt_s3_multi_clients_comm_manager.py:159:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0
, user data None
[FedML-Client(2) @device-id-1] [Tue, 03 May 2022 16:14:10] [INFO] [mqtt_s3_status_manager.py:80:_on_disconnect] mqtt_s3.on_disconnect: disconnection returned result 0, user data None
@chaoyanghe
@Nicole456 let's set a meeting today to do a live debugging.
@chaoyanghe @Nicole456 Revisiting this issue. Has it been addressed?