fastmoe
fastmoe copied to clipboard
A fast MoE impl for PyTorch
**Describe the bug** In `fmoe/gates/switch_gate.py` line 45: ` capacity = math.ceil(cap_rate * inp.shape[0])` should be: ` capacity = math.ceil(cap_rate * inp.shape[0] / self.num_expert)` ?
I am trying to use the FMoE layer in a ViT-Base model for a simple classification task. However, there is a gradual increase in CUDA memory, which eventually leads to...
Hi, I'm trying to implement a simpler version of `switch transformer` following your work. But the detail of `switch_gate` is invisible, like `limit_by_capacity`. My implementation has a slight different result...
您好,我使用两台机器每台机器8卡,且使用1个专家,top_k=1 model = FMoETransformerMLP(num_expert=1,d_model=d_model,d_hidden=d_model, world_size =torch.distributed.get_world_size(),top_k=1) 训练伪代码: backbone_ddp = fmoe.DistributedGroupedDataParallel(model,device_ids) .... .... backbone_ddp.allreduce_params() optm.step() 这样应该是16*1个专家并行吧? `File "/usr/local/python3.7.1/lib/python3.7/site-packages/fastmoe-1.1.0-py3.7-linux-x86_64.egg/fmoe/gates/naive_gate.py", line 33, in forward` `gate, k=self.top_k, dim=-1, largest=True, sorted=False RuntimeError: invalid argument 5:...
Hi, Thanks for the exciting work!! I want to use the parallel methods when running Megatron, but seems there isn't an example/script to run Megatron and I cannot find a...
你好,我在用domaingate的时候遇到这个问题,请问是版本冲突吗?
**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd...
**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd...
When I use a custom expert, inherit the FMoE class, and turn on Smart schedule to report an error ``` [ubuntu:8399 :0:8399] Caught signal 11 (Segmentation fault: invalid permissions for...