laekov

Results 38 comments of laekov

如果您希望使用 im 软件进行沟通, 您可以加入我们的 slack workspace 来和大家沟通. README 上有邀请链接.

To run FastMoE with Megatron, you are supposed to use Megatron's main function, e.g. `pretrain_gpt.py`, with FastMoE's patch applied.

You should use the patch that matches your Megatron version. The key operation to enable moe is adding `--fmoefy` argument to the `pretrain_xxx.py`

这个问题的原因看起来跑到 `naive_gate.py:33` 这里的时候 `k` 变成 5 了, 比较奇怪. 您可以在 python 里找一下这个 k 是在哪里变成 5 的吗? 谢谢.

> It's not clear to me what input and output that need to be constrained means > The input and output features have to be of the same length for...

> if the result that local_expert_count gets on each card (worldsize) is the same or different `local_expert_count` differs on each GPU, because it includes the counters of samples in the...

For using customized expert module, see #121 as a reference. For customized gates, you can refer to our gate implementation, e.g. [NaiveGate](https://github.com/laekov/fastmoe/blob/master/fmoe/gates/naive_gate.py). You can then feed the class into `FMoE`...

> but I get an error when I execute stored_models_[i], there is no way to get its value, but it is possible to print its size `stored_models_` is the output...

Well, thank you very much for reporting this issue and ebugging. I think we should explicitly specify the device of tensors when we allocate them in our library. We will...

The pruning function is implemented [here](https://github.com/laekov/fastmoe/blob/b477ab5edc5142f0e86f6d35bce9a4361c369a6b/cuda/balancing.cuh#L23).