fastmoe issues

Adding Expert Prototyping to FastMoE

1

Hi, thanks for your provding end-to-end training framework in Pytorch for MoE models. We have recently implemented MoE in tensorflow and found out that categorizing experts to different groups can...

JustinLin610

enhancement

打开Smart schedule运行examples/transformer-xl/scripts/run_enwik8_base_moe.sh 报错

5

**Describe the bug** A clear and concise description of what the bug is. When I use export FMOE_FASTER_SHADOW_ENABLE=1 and export FMOE_FASTER_SCHEDULE_ENABLE=1 to turn on Smart schedule,and bash examples/transformer-xl/scripts/run_enwik8_base_moe.sh train, it...

WhatBrain

how to run transformer-xl with parallel experts with single gpu?

6

seems fast-moe still cannot archive running multi experts in parallel with single gpu card?

HudashiNeo

Do We support DeepSpeed training? Thanks.

1

**Is your feature request related to a problem? Please describe.** My model is trained with Deepspeed + zero2 framework, i want to know how can i use fastmoe to train...

lzl-mt

前向传播返回值缺少bal_loss

2

在应用完补丁执行pretrain_gpt.py遇到的问题 Traceback (most recent call last): File "pretrain_gpt.py", line 126, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/workspace/Megatron-LM/megatron/training.py", line 157, in pretrain iteration = train(forward_step_func, File "/workspace/Megatron-LM/megatron/training.py", line 630, in train train_step(forward_step_func,...

tisgotos

fastmoe
fastmoe copied to clipboard

Metadata

Adding Expert Prototyping to FastMoE

打开Smart schedule运行examples/transformer-xl/scripts/run_enwik8_base_moe.sh 报错

how to run transformer-xl with parallel experts with single gpu?

Do We support DeepSpeed training? Thanks.

前向传播返回值缺少bal_loss

← Metadata

Owner

Metadata

fastmoe fastmoe copied to clipboard

Metadata

Adding Expert Prototyping to FastMoE

打开Smart schedule运行examples/transformer-xl/scripts/run_enwik8_base_moe.sh 报错

how to run transformer-xl with parallel experts with single gpu?

Do We support DeepSpeed training? Thanks.

前向传播返回值缺少bal_loss

← Metadata

Owner

Metadata

fastmoe
fastmoe copied to clipboard