Jasper
Jasper
@ruipeterpan Did you launch any worker in your example above? The server should do nothing but wait when there is no worker.
OK I can reproduce it. I will look at it.
> @vycezhong Could this problem be related to #225? > > My experience is that v0.2.4 worked well (e.g., see #271). Yes. It is because I use `update_buf` for pulling....
@ruipeterpan You also need to enable async for workers.
@ymjiang I think it should be `Parameter` here? `AsyncParam` in servers will be initialized with random values. https://github.com/bytedance/byteps/blob/6957bc38a112f10cd0bbef576de97719d9fce1a5/byteps/torch/__init__.py#L204
The first incoming `recv` should be random values. https://github.com/bytedance/byteps/blob/6957bc38a112f10cd0bbef576de97719d9fce1a5/byteps/common/operations.cc#L357
@ruipeterpan Please try this commit. https://github.com/bytedance/byteps/pull/359/commits/7ac1dc74335b8935e4ac897e8d92d9c563fdf110
@ymjiang It is because parameter broadcasting also becomes asynchronous. The buffer is initialized with random values as shown in the figure below.  I suggest removing the copy and initialize...
> We use AVX instructions to accelerate fp16 summation. Is there any specific problem you observe that shows fp16 dominates the overhead? Yes. But it still has to be converted...
I see. I just wonder whether fp16 can give certain (say, 10%) speedup in communication-intensive cases, like VGG-16.. The fact that gradient compression involves a lot of computations, where adding...