exo
exo copied to clipboard
Add support for GLM-4.7
- [ ] Basic model support (auto parallel with pipeline)
- [ ] Tensor parallel
With the latest main, i was able to add a card for mlx-community/GLM-4.7-8bit-gs32
"glm-4.7-8bit-gs32": ModelCard(
short_id="glm-4.7-8bit-gs32",
model_id=ModelId("mlx-community/GLM-4.7-8bit-gs32"),
name="GLM 4.7 (8-bit group-size 32, MLX)",
description="""GLM-4.7 converted to MLX format with 8-bit weights and group size 32. This is one of the highest-fidelity MLX variants of GLM-4.7 available and is optimized for Apple Silicon / MLX unified memory execution.""",
tags=["glm", "glm4.7", "mlx", "8bit", "gs32"],
metadata=ModelMetadata(
model_id=ModelId("mlx-community/GLM-4.7-8bit-gs32"),
pretty_name="GLM 4.7 (8-bit GS32, MLX)",
# approximate disk footprint for large GLM4.7 8bit-gs32 variant
storage_size=Memory.from_gb(182),
n_layers=92,
hidden_size=5120,
supports_tensor=True,
),
),
and things worked without any changes other than adding the card.
Guys please include 5bit and 6bit also in the model card when they’re available. I have 2x256GB M3 Ultra and sometimes 8bit doesn’t leave much for large context.
will do
There doesn't seem to be a 5bit available for GLM-4.7, but we've got 4bit and 6bit in the model cards in #1147
Thanks @Evanev7