exo Add support for GLM-4.7

[ ] Basic model support (auto parallel with pipeline)
[ ] Tensor parallel

Jan 13 '26 11:01 AlexCheema

With the latest main, i was able to add a card for mlx-community/GLM-4.7-8bit-gs32

    "glm-4.7-8bit-gs32": ModelCard(
        short_id="glm-4.7-8bit-gs32",
        model_id=ModelId("mlx-community/GLM-4.7-8bit-gs32"),
        name="GLM 4.7 (8-bit group-size 32, MLX)",
        description="""GLM-4.7 converted to MLX format with 8-bit weights and group size 32. This is one of the highest-fidelity MLX variants of GLM-4.7 available and is optimized for Apple Silicon / MLX unified memory execution.""",
        tags=["glm", "glm4.7", "mlx", "8bit", "gs32"],
        metadata=ModelMetadata(
            model_id=ModelId("mlx-community/GLM-4.7-8bit-gs32"),
            pretty_name="GLM 4.7 (8-bit GS32, MLX)",
            # approximate disk footprint for large GLM4.7 8bit-gs32 variant
            storage_size=Memory.from_gb(182),
            n_layers=92,
            hidden_size=5120,
            supports_tensor=True,
        ),
    ),

and things worked without any changes other than adding the card.

Jan 14 '26 02:01 frenchtoasters

Guys please include 5bit and 6bit also in the model card when they’re available. I have 2x256GB M3 Ultra and sometimes 8bit doesn’t leave much for large context.

Jan 14 '26 04:01 darthsider

will do

Jan 14 '26 11:01 Evanev7

There doesn't seem to be a 5bit available for GLM-4.7, but we've got 4bit and 6bit in the model cards in #1147

Jan 14 '26 11:01 Evanev7

Thanks @Evanev7

Jan 14 '26 11:01 darthsider