litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Mixtral 8x22B support

Open SergioG-M opened this issue 1 year ago • 4 comments

Is there any plan to add support for Mixtral 8x22B?

SergioG-M avatar May 27 '24 08:05 SergioG-M

We are supporting Mixtral 8x7B already, so adding Mixtral 8x22B might not be too difficult unless there are some unexpected other changes between these two architectures beside their different sizes. I'd be nice to add it some time.

rasbt avatar May 27 '24 11:05 rasbt

Looking at the hub, I think the config we need for this is

    dict(
        name="Mixtral-8x22B-{}v0.1",
        hf_config=dict(org="mistralai", name="Mixtral-8x22B-{}v0.1"),
        padded_vocab_size=32000,
        block_size=65536,
        n_layer=56,
        n_query_groups=8,
        rotary_percentage=1.0,
        parallel_residual=False,
        bias=False,
        norm_class_name="RMSNorm",
        norm_eps=1e-05,
        mlp_class_name="LLaMAMoE",
        intermediate_size=16384,
        n_head=48,  # double-check
        rope_base=1000000,
        n_expert=8,
        n_expert_per_token=2,
    ),
]

but I haven't double-checked yet. It's a lot of weights (>600GB) to download.

rasbt avatar May 28 '24 19:05 rasbt

Any update on this?

SergioG-M avatar Jun 11 '24 13:06 SergioG-M

Right now, given that there are so many other things to do, I haven't had that on my priority list. But we'd be happy about contributions if you are interested in adding this model.

rasbt avatar Jun 11 '24 15:06 rasbt