Jasper

Results 62 comments of Jasper

@ruipeterpan Did you launch any worker in your example above? The server should do nothing but wait when there is no worker.

OK I can reproduce it. I will look at it.

> @vycezhong Could this problem be related to #225? > > My experience is that v0.2.4 worked well (e.g., see #271). Yes. It is because I use `update_buf` for pulling....

@ruipeterpan You also need to enable async for workers.

@ymjiang I think it should be `Parameter` here? `AsyncParam` in servers will be initialized with random values. https://github.com/bytedance/byteps/blob/6957bc38a112f10cd0bbef576de97719d9fce1a5/byteps/torch/__init__.py#L204

The first incoming `recv` should be random values. https://github.com/bytedance/byteps/blob/6957bc38a112f10cd0bbef576de97719d9fce1a5/byteps/common/operations.cc#L357

@ruipeterpan Please try this commit. https://github.com/bytedance/byteps/pull/359/commits/7ac1dc74335b8935e4ac897e8d92d9c563fdf110

@ymjiang It is because parameter broadcasting also becomes asynchronous. The buffer is initialized with random values as shown in the figure below. ![image](https://user-images.githubusercontent.com/25879526/105571484-14754880-5d8b-11eb-8ef0-47153395b1c2.png) I suggest removing the copy and initialize...

> We use AVX instructions to accelerate fp16 summation. Is there any specific problem you observe that shows fp16 dominates the overhead? Yes. But it still has to be converted...

I see. I just wonder whether fp16 can give certain (say, 10%) speedup in communication-intensive cases, like VGG-16.. The fact that gradient compression involves a lot of computations, where adding...