megablocks issues

_LOAD_BALANCING_LOSS returns empty list sometimes

1

Hello, I am using Eleuther AI's gpt-neox implementation with megablocks, but I get 2 errors related to the `_LOAD_BALANCING_LOSS`. 1. the `tokens_per_expert` gives me this error at this [line](https://github.com/databricks/megablocks/blob/f1a83bd55413b02b472696b719646cf22732d070/megablocks/layers/moe.py#L34). `ValueError:...

exnx

Bad throughput with GLU

1

I'm training models with the below specs but seeing major throughput drop when switching to GLU - Do you know why? / Ideas what I could investigate? Thanks a lot!...

Muennighoff

1-expert worse than dense model

1

I'm finding that training a 1-expert dMoE (brown) has worse training loss than an otherwise equivalent dense model (green). Is there some reason why this difference is expected or can...

Muennighoff

support amd/rocm

3

when I run `pip install megablocks` I get this: ``` clang: error: unsupported option '--ptxas-options=-v' clang: error: unsupported option '--generate-code=arch=compute_90,code=sm_90' ```

ehartford

enhancement

help wanted

OSError: Stale file handle with dMoE

3

I am getting the below error upon the first step of multinode training with dMoE. Meanwhile, multinode MoE training & single node dMoE works fine. Any ideas what the problem...

Muennighoff

Add a fine-tune script for JetMoE

2

@tgale96 The JetMoE technical report has mentioned how they used Megablocks with Megatrone to train the model. Then the author shared [this](https://github.com/yikangshen/megablocks) fork of the megablokcs used during the training....

shamanez

ScatterMoE feature

5

I would like to request ScatterMoE feature in Megablocks https://arxiv.org/abs/2403.08245 https://github.com/shawntan/scattermoe

ehartford

RuntimeError: Triton Error [CUDA]: invalid argument

15

I fllow the next step: - run docker build . -t megablocks-dev - and then bash docker.sh to launch the container. When I run `moe_46m_8gpu.sh` to test, it reported the...

noob-ctrl

question

Implement Mixture of Depth and Experts (MoDE)

2

Given that MegaBlocks is highly optimized for sparse MoE models like Mixtral, I am requesting support for a variant recently termed as MoDE by Google DeepMind. Benefits include much faster...

casper-hansen

Import dmoe model into other training script?

3

Is it possible to import the dmoe model itself into another training script without training via megatron?

andrewnc

megablocks
megablocks copied to clipboard

Metadata

_LOAD_BALANCING_LOSS returns empty list sometimes

Bad throughput with GLU

1-expert worse than dense model

support amd/rocm

OSError: Stale file handle with dMoE

Add a fine-tune script for JetMoE

ScatterMoE feature

RuntimeError: Triton Error [CUDA]: invalid argument

Implement Mixture of Depth and Experts (MoDE)

Import dmoe model into other training script?

← Metadata

Owner

Metadata

megablocks megablocks copied to clipboard

Metadata

← Metadata

Owner

Metadata

megablocks
megablocks copied to clipboard