fastmoe icon indicating copy to clipboard operation
fastmoe copied to clipboard

A fast MoE impl for PyTorch

Results 25 fastmoe issues
Sort by recently updated
recently updated
newest added

I find out the moe is 0, but i don't know why ![image](https://github.com/laekov/fastmoe/assets/51167745/97b92338-ef8f-494f-a73f-94530da91cf1)

No functions named get_args() in megatron The funxitons in __init__ doesn't include get_args r""" A set of modules to plugin into Megatron-LM with FastMoE """ from .utils import add_fmoe_args from...

The training process proceeds smoothly; however, an issue arises during inference as the **noise_stddev** becomes zero when **self.training** is False, leading to an error when computing the **load**. Should we...

**Describe the bug** I adapt fmoe into Megatron as the tutorial and want to run a script to train gpt. But when I run ```pretrain_gpt.sh```, it raises the error called...

how to apply balance loss? can u add it to the example 'transformer-xl'?

**Describe the bug** I am trying to create a minimal run-able example of Smart Scheduling proposed by the FasterMoE paper. However, when I profile the example using Nsight Systems, it...

I notice the L2 norm for experts is reduced twice in model parallel group, please see: https://github.com/laekov/fastmoe/blob/cd8372b3a8a5e73d46d2b463ec30995631cfc181/examples/megatron/clip-grad-v2.2.patch#L44C2-L44C2. It is a good ideas to add up the square gradients of all...

**Describe the bug** When running the transformer-XL example on enwik8, the log shows there are only 204 unique tokens (vocabulary size) in enwik8 training set. **To Reproduce** Steps to reproduce...

Thank you for providing an end-to-end Framework to train the MoE system, I would to ask if I can able to use this in Vision Task m in the Case...