LLaMA-MoE-v2 issues

Results 3 LLaMA-MoE-v2 issues

Sort by recently updated

The trained LLaMA-MLP-MoE (2/8) cannot generate the correct output

Hello authors. I tried to train LLaMA-MLP-MoE (2/8). After two stages of training, the model cannot output normal sentences. The inference script is as follows: ```python model_dir = "" tokenizer...

cnlinxi

Inquiry About K-Means Initialization for Gates Without Fine-Tuning

Thanks for your excellent work! I came across your paper and noticed that the gates are initialized using K-Means, which seems quite innovative. However, the paper does not mention the...

pprp

Add megablocks support for MLP MoE

## What's New Add [megablocks](https://github.com/[databricks/megablocks](https://github.com/databricks/megablocks)) support for MLP MoE. Dumping & Reloading test is passed by observing the continuous loss decline. But further downstream metrics are not tested. Please use...

Spico197

LLaMA-MoE-v2
LLaMA-MoE-v2 copied to clipboard

Metadata

The trained LLaMA-MLP-MoE (2/8) cannot generate the correct output

Inquiry About K-Means Initialization for Gates Without Fine-Tuning

Add megablocks support for MLP MoE

← Metadata

Owner

Metadata

LLaMA-MoE-v2 LLaMA-MoE-v2 copied to clipboard

Metadata

The trained LLaMA-MLP-MoE (2/8) cannot generate the correct output

Inquiry About K-Means Initialization for Gates Without Fine-Tuning

Add megablocks support for MLP MoE

← Metadata

Owner

Metadata

LLaMA-MoE-v2
LLaMA-MoE-v2 copied to clipboard