sparseml
sparseml copied to clipboard
[MOE Quantization] Warn against "undercalibrated" modules
Note: this branch requires this PR: https://github.com/neuralmagic/compressed-tensors/pull/46 to land in compressed-tensors.
Example Use:
from sparseml.transformers import SparseAutoModelForCausalLM, SparseAutoTokenizer, oneshot
import os
import torch
model_name = "Isotonic/TinyMixtral-4x248M-MoE"
model = SparseAutoModelForCausalLM.from_pretrained(
model_name,
device_map="cuda:0",
torch_dtype=torch.float16,
)
tokenizer = SparseAutoTokenizer.from_pretrained(
model_name
)
dataset = "open-platypus"
recipe = "tests/sparseml/transformers/compression/recipes/new_quant_full.yaml"
oneshot(
model=model,
dataset=dataset,
overwrite_output_dir=True,
output_dir="./output_one_shot",
recipe=recipe,
num_calibration_samples=4,
pad_to_max_length=False,
min_tokens_per_group = 0.3
)
2024-05-15 12:15:13 sparseml.transformers.finetune.runner INFO *** One Shot ***
2024-05-15 12:15:14 sparseml.core.recipe.recipe INFO Loading recipe from file tests/sparseml/transformers/compression/recipes/new_quant_full.yaml
/root/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_name" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
/root/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_kwargs" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
2024-05-15 12:15:14 sparseml.modifiers.quantization_vllm.pytorch INFO Running vLLMQuantizationModifier calibration with 4 samples...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.19it/s]
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.0.block_sparse_moe.experts.1.w1 received less than 30% of calibration batch tokens (212/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.0.block_sparse_moe.experts.1.w2 received less than 30% of calibration batch tokens (212/970 tokens). This could result may harm the quantization quality.
...
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.10.block_sparse_moe.experts.3.w3 received less than 30% of calibration batch tokens (233/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.11.block_sparse_moe.experts.2.w1 received less than 30% of calibration batch tokens (21/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.11.block_sparse_moe.experts.2.w2 received less than 30% of calibration batch tokens (21/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.11.block_sparse_moe.experts.2.w3 received less than 30% of calibration batch tokens (21/970 tokens). This could result may harm the quantization quality.