megablocks issues

How to integrate to transformers-based mixtral

1

Hi, this is awesome work. I'm wondering if there is a minimal way to integrate megablocks into transformers codebase for the mixtral architecture? Would simply replacing the [`MixtralSparseMoeBlock` ](https://github.com/huggingface/transformers/blob/aa4a0f8ef37eb5d42b4e3810f37e554585c90d41/src/transformers/models/mixtral/modeling_mixtral.py#L854) with...

nxphi47

question

fix the abnormal ‘CAPACITY_FACTOR’ value

3

jordgedu

Why the second matrix of the mlp layer has the same shape of the first one?

1

It's more a question than an issue. The tensor [w2](https://github.com/stanford-futuredata/megablocks/blob/main/megablocks/layers/mlp.py#L341C9-L341C50) of class SparseMLP has the same shape as the w1, is it because of the DSD operation? like, it requires...

gouchangjiang

question

[BUG] Optimizer Weights Not Reloaded When Training with bf16 Pretrained Weights

1

While working with the load_checkpoint function in the file `third_party/Megatron-LM/megatron/checkpointing.py`, I noticed that the condition on [line 585](https://github.com/stanford-futuredata/Megatron-LM/blob/3a9e3d8de308e6f6398b59d16a8bd7177374f121/megatron/checkpointing.py#L585): ``` if args.fp16 and optimizer is not None: ``` should be modified...

RookieHong

bug

Comparison against top-2 routing?

4

Hi, The paper results are very impressive. But I notice the comparision is againest top-1 routing. Do you have results against top-2 routing? This would make the comparison more challenging...

sunnyszy

question

Script for Full Fine-Tuning of Mixtral

1

Hi, I see that there is a script for training Mixtral, but not one for fine-tuning. Could you please provide it? The whole community is having a lot of issues...

alpayariyak

question

bump torch to <2.5

2

# What does this PR do? bump torch to # Before submitting - [ ] Have you read the [contributor guidelines](https://github.com/databricks/megablocks/blob/dev/CONTRIBUTING.md)? - [ ] Is this change a documentation change...

eitanturok