Zhenheng TANG

Results 21 comments of Zhenheng TANG

I also find that the most current codes of dirichlet partition method cannot generate a balanced client datasets. This may cause training harder.

> @wizard1203 could you help to check if this improves the performance? Seems it can.

> BytePS is for data center-based distributed training, while FedML (e.g., FedAvg) is edge-based distributed training. The particular assumptions of FL include: > > 1. heterogeneous data distribution cross devices...

> FedML supports multiple parameter servers for the communication efficiency via hierarchical FL and decentralized FL . > In hierarchical FL, there are group parameter servers that split the total...

@chaoyanghe Thanks for your detailed explanation. Maybe I can try to complete it by myself, and when I finish it I would like to push it to your master branch.

> @wizard1203 Do you mean modifying based on this code? > https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/distributed/fedavg @chaoyanghe No, maybe it needs to base on those codes on fedml_core. Whatever, I may try to do...

I met the same problems when using simulation... There are always failures during each round.

![image](https://user-images.githubusercontent.com/22996426/189470503-ce2e1fe5-8086-464a-a818-db504ad14119.png) I apparently raise RuntimeError in the function fit(). But there is only the print of 5 failures without the error log. ![image](https://user-images.githubusercontent.com/22996426/189470549-e6917a30-b5e5-4f32-b21c-20d0e4e29975.png)

@AbdulMoqeet @hangxu0304 Hi, could you have another try with a smaller number of local epochs, e.g. E=1. The large epochs usually make training harder to converge.