megablocks issues

SFT Script and Hyperparameters used for DBRX-Instruct

5

Hi, I saw you mentioned that you used your fork of Megatron-LM for training - could you please provide scripts and hyperparams used for the SFT of DBRX? It would...

alpayariyak

Does this framework support SFT?

2

banksy23

question

AMP + BF16 failing

4

Hi there, Great work with dMoE! I'm trying to test dMoE with regular DDP + pytorch AMP(BF16) and I get the following error: ```bash optimizer_state["found_inf_per_device"] = self._unscale_grads_( File "/miniconda/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line...

jramapuram

Unsharding scripts for megablocks models

The base Megatron-LM repo provides unsharding scripts for the models which can be used after training. I didn't find any such scripts in the repo. Would it be possible to...

mayank31398

the wrong loss func was chosen at evaluation

2

The loss func is always `moe_loss_func` as can be seen [here](https://github.com/stanford-futuredata/Megatron-LM/blob/3a9e3d8de308e6f6398b59d16a8bd7177374f121/pretrain_gpt.py#L128). But the loss is only calculated when training, which can be seen [here](https://github.com/stanford-futuredata/megablocks/blob/f05609ce69c1e1a7dd008c49cf435ef74df84b69/megablocks/layers/moe.py#L427-L428). We should fallback to the original...

peterjc123

Seeking a good multi-node training config

3

Thanks for the excellent work. Following the comment in #59, I am trying to train `dmoe_760m` using 16 GPUs (2 nodes) by changing distributed arguments to set up for two...

rpand002

selective router precision

1

To my understanding -- and please correct me if I am wrong about this -- there is no mechanism to selectively compute routing logits in fp32, as is suggested in...

152334H

question

different load_balancing_loss with different pipeline_parallel_size

8

I load the same model trained with megatron + megablocks, and I found the load_balancing_loss is slightly different. When I increase the pipeline_parallel_size, the load_balancing_loss is also increasing. Is it...

bozheng-hit

question

Docker issues with PyPI installation

3

When I run `pip install megablocks`, I seem to be getting this error: `RuntimeError: ('Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and...

sedrick-keh-tri

ParallelDroplessMLP initialises self.mlp twice

6

What the title says. In `layers/dmoe.py`: ```python class ParallelDroplessMLP(moe.ParallelMLP): def __init__(self, args : Arguments): super(ParallelDroplessMLP, self).__init__(args) #

152334H

enhancement

help wanted

megablocks
megablocks copied to clipboard

Metadata

SFT Script and Hyperparameters used for DBRX-Instruct

Does this framework support SFT?

AMP + BF16 failing

Unsharding scripts for megablocks models

the wrong loss func was chosen at evaluation

Seeking a good multi-node training config

selective router precision

different load_balancing_loss with different pipeline_parallel_size

Docker issues with PyPI installation

ParallelDroplessMLP initialises self.mlp twice

← Metadata

Owner

Metadata

megablocks megablocks copied to clipboard

Metadata

← Metadata

Owner

Metadata

megablocks
megablocks copied to clipboard