Xiulong Yuan

Results 27 issues of Xiulong Yuan

### Discussed in https://github.com/openmlsys/openmlsys-zh/discussions/172 Originally posted by **chaoyanghe** March 22, 2022 Hi OpenMLSys Team, This is a nice effort. I am also writing a book for federated and distributed training,...

discussion
to be confirmed
Priority P0

本Issue主要提议[数据处理章节](https://openmlsys.github.io/chapter_data_processing/index.html)后续的内容完善以及拓展,有更多想法和建议的社区用户可以在这个issue底下跟帖回复~ - [ ] 弹性训练下的数据pipeline构建以及状态迁移 - [ ] 分布式dataset的处理

discussion

Elastic training and Non-Elastic training seems to have the same failure processing strategy, both restart all failed workers and wait until these workers finish loading data.[source code here](https://github.com/ray-project/xgboost_ray/blob/f88118d44e338ffdba989d47eaf54fca9535deca/xgboost_ray/main.py#L1477)