FedML icon indicating copy to clipboard operation
FedML copied to clipboard

Cannot handle connection ready

Open SuperXxts opened this issue 2 years ago • 11 comments

I found a problem in "FedML/Python/app/fedcv/object_detection/runs/train/exp/log_0.txt"

client_indexes = [0, 1]
running
receive_message. msg_type = 0, sender_id = 0, receiver_id = 0
Cannot handle connection ready
receive_message. msg_type = 3, sender_id = 1, receiver_id = 0
add_model. index = 0
b_all_received = False
receive_message. msg_type = 3, sender_id = 2, receiver_id = 0
add_model. index = 1
b_all_received = True
len of self.model_dict[idx] = 2
set_model_params
aggregate time cost: 0
round_idx: 0
Saving model at round 0
client_indexes = [0, 1]
send_message_sync_model_to_client. receive_id = 1
send_message_sync_model_to_client. receive_id = 2

The fourth line shows "Cannot handle connection ready". Why? Will it have any impact?

SuperXxts avatar Oct 26 '22 12:10 SuperXxts

I also met this problem, did you fix it?

ATEDUST avatar Oct 27 '22 11:10 ATEDUST

My problem is in FedIot

ATEDUST avatar Oct 27 '22 11:10 ATEDUST

My problem is in FedIot

I haven't solved this problem yet

SuperXxts avatar Oct 27 '22 12:10 SuperXxts

I also met this problem, did you fix it?

Have you solved this problem?

SuperXxts avatar Oct 28 '22 10:10 SuperXxts

I also met this problem, did you fix it?

Have you solved this problem? No, I didn't

ATEDUST avatar Oct 30 '22 07:10 ATEDUST

I also met this problem, did you fix it?

Have you solved this problem?

Use another communication protocol. Currently, by default they use MQTT but this protocol implementation has some bug. I spent so much time to figure out how to solve this bug. I even sent this problem authors official email but got zero support. So, if don't have specific constraint on communication protocol, then I suggest you to use gRPC protocol.

Check this link for how to set up gRPC protocol.

https://github.com/FedML-AI/FedML/blob/test/v0.7.0/python/examples/cross_silo/grpc_fedavg_mnist_lr_example/custom_data_and_model/config/fedml_config.yaml

Adeelbek avatar Nov 07 '22 02:11 Adeelbek

我也遇到了这个问题,你解决了吗?

你解决了这个问题吗?

使用其他通信协议。目前,默认情况下他们使用 MQTT,但是这个协议实现有一些错误。 我花了很多时间来弄清楚如何解决这个错误。我什至向这个问题作者发送了官方电子邮件,但得到的支持为零。 所以,如果对通信协议没有特别的限制,那么我建议你使用 gRPC 协议。

检查此链接以了解如何设置 gRPC 协议。

https://github.com/FedML-AI/FedML/blob/test/v0.7.0/python/examples/cross_silo/grpc_fedavg_mnist_lr_example/custom_data_and_model/config/fedml_config.yaml

Wow, thank you very much for the advice, it might be very useful.

SuperXxts avatar Nov 07 '22 02:11 SuperXxts

I also met this problem, did you fix it?

Have you solved this problem?

Use another communication protocol. Currently, by default they use MQTT but this protocol implementation has some bug. I spent so much time to figure out how to solve this bug. I even sent this problem authors official email but got zero support. So, if don't have specific constraint on communication protocol, then I suggest you to use gRPC protocol.

Check this link for how to set up gRPC protocol.

https://github.com/FedML-AI/FedML/blob/test/v0.7.0/python/examples/cross_silo/grpc_fedavg_mnist_lr_example/custom_data_and_model/config/fedml_config.yaml

Very helpful,thx

ATEDUST avatar Nov 07 '22 02:11 ATEDUST

@Adeelbek @SuperXxts @ATEDUST May I know what's the issue in MQTT_S3 communication protocol? For MQTT, it requests our public server at AWS. I guess it's because your local network is not connected to the internet.

You can check your network by fedml env: image

chaoyanghe avatar Nov 07 '22 18:11 chaoyanghe

I found a problem in "FedML/Python/app/fedcv/object_detection/runs/train/exp/log_0.txt"

client_indexes = [0, 1]
running
receive_message. msg_type = 0, sender_id = 0, receiver_id = 0
Cannot handle connection ready
receive_message. msg_type = 3, sender_id = 1, receiver_id = 0
add_model. index = 0
b_all_received = False
receive_message. msg_type = 3, sender_id = 2, receiver_id = 0
add_model. index = 1
b_all_received = True
len of self.model_dict[idx] = 2
set_model_params
aggregate time cost: 0
round_idx: 0
Saving model at round 0
client_indexes = [0, 1]
send_message_sync_model_to_client. receive_id = 1
send_message_sync_model_to_client. receive_id = 2

The fourth line shows "Cannot handle connection ready". Why? Will it have any impact?

Using MPI I get the same result. Can we understand it as a necessary process for establishing a communication connection? Because follow we can see: receive_message. msg_type = 3, sender_id = 1, receiver_id = 0 receive_message. msg_type = 3, sender_id = 2, receiver_id = 0 And at this time the training process starts. By the way, did you find out what effect it has?

weizj0328 avatar Feb 09 '23 10:02 weizj0328

I also met this problem, did you fix it?

Have you solved this problem?

Use another communication protocol. Currently, by default they use MQTT but this protocol implementation has some bug. I spent so much time to figure out how to solve this bug. I even sent this problem authors official email but got zero support. So, if don't have specific constraint on communication protocol, then I suggest you to use gRPC protocol.

Check this link for how to set up gRPC protocol.

https://github.com/FedML-AI/FedML/blob/test/v0.7.0/python/examples/cross_silo/grpc_fedavg_mnist_lr_example/custom_data_and_model/config/fedml_config.yaml

Greetings! I also encountered the same issue while using MPI protocal in FedNLP/text_classification. Have you succeeded in replacing the MPI protocal into GRPC? Shall I change something else except for the config? The server/client.sh don't seem to fit cross_silo training type very well:(. Really need some advice.

Norsen-Miles avatar Mar 21 '24 10:03 Norsen-Miles