YouhuiBai
YouhuiBai
@bobzhuyb Thanks a lot, the AWS instances don't support native RDMA, I will try the EFA implementation.
@bobzhuyb Is the EFA implementation merged into master branch of BytePS? Are there any tutorials or benchmarks? Thank you.
@bobzhuyb Thank you very much, I will try it.
@bobzhuyb Hi, I built ps-lite with `USE_FABRIC=1` flag, and ran test_benchmark successfully. But when I further installed BytePS from source code and tried to launch a scheduler role, I met...
@ymjiang I installed `libfabric-aws-dev`, as shown following: 
@bobzhuyb It works for me to add `fabric` in L331 of setup.py, as well as add EFA include and library path to `INCLUDES` and `LIBRARY_DIRS`. Thanks a lot!
@bobzhuyb @ymjiang Hi, I met a new problem when running BytePS with multiple AWS instances, the error message is showed in following figure, which is only printed at workers. I...
@bobzhuyb I have no ideas either, I even disabled the ps-lite make option `USE_RDMA=1` and only enable `USE_FABRIC=1`, the log messages show that it is creating fabric van rather than...
@bobzhuyb @ymjiang Hi, did you enable `hierarchical-allreduce` for Horovod in your OSDI20 paper's evaluations?
@ymjiang I mean the hierarchical of Horovod rather than NCCL, Horovod implements hierarchical-allreduce by `ReduceScatter`, `Bcast` and etc, which can be enabled by an environment variable or parameter of `horovodrun`.