fastmoe A bug in switch

A bug in switch_gate

Open Heihaierr opened this issue 11 months ago • 6 comments

Describe the bug In fmoe/gates/switch_gate.py line 45: capacity = math.ceil(cap_rate * inp.shape[0])

should be: capacity = math.ceil(cap_rate * inp.shape[0] / self.num_expert) ?

Mar 07 '24 11:03 Heihaierr

That is a good point. I think you are right. Can you please open a pull request on this? Thanks.

BTW, I am also wondering if the capacity calculation in GShardGate is wrong. @zms1999

Mar 11 '24 07:03 laekov

Hi, guys!
Thanks for your fantastic work. I met a problem when I use class SwitchGate, can you take a look at it for me?

The following is my code:

import torch
from fmoe.gates import *

device = torch.device("cuda:0")

sg = SwitchGate(d_model=64, num_expert=5, world_size=2)
sg = sg.to(device)

input = torch.rand(128, 64) # (batch_size, d_model)
input = input.to(device)

idx, val = sg(input)
print(idx, idx.shape)
print(val, val.shape)

Parameter word_size can only set to 1, or it will occur the error "Segmentation fault (core dumped)".

Apr 06 '24 04:04 Peg-Wu

@Peg-Wu As you are not using torch distributed, world_size has to be 1.

Apr 07 '24 10:04 laekov

谢谢您的回复~

如果我想用DDP进行加速，我应该怎样修改代码，可以使用pytorch官方的DDP并行吗

Apr 08 '24 04:04 Peg-Wu

@Peg-Wu Refer to this test

Apr 08 '24 07:04 laekov

非常感谢！

Apr 08 '24 07:04 Peg-Wu

fastmoe fastmoe copied to clipboard

A bug in switch_gate

fastmoe
fastmoe copied to clipboard