DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Why the MoE layer forces the input hidden dimenstion the same as the output hidden dimension?

Open hobbitlzy opened this issue 3 years ago • 2 comments

image I notice a problem when I read the tutorial of MoE. As the figure shows, it says the input dimension must be equal to the output dimension. This absolutely imports complexity for many DNNs, and I don't understand why this is necessary. In my view, the output dimension is supposed to be easily set at will?

hobbitlzy avatar Jul 14 '22 08:07 hobbitlzy

@ykim362 -- I know its a very old issue but do you mind explaining this here?

awan-10 avatar Aug 16 '22 21:08 awan-10

I think this is because the initial implementation is based on the NLP applications following GShard and Switch. We have a certain tensor dimension assumption - batch, sequence length at the beginning of input tensors.

ykim362 avatar Aug 16 '22 21:08 ykim362

Hi,

I have fixed this issue in this PR https://github.com/microsoft/DeepSpeed/pull/2530/files.

mhjabreel avatar Nov 22 '22 08:11 mhjabreel