AQLM
AQLM copied to clipboard
How model_seqlen affects quantization quality
Hi!
Thanks for such a useful tool!
I have a question about model_seqlen:
As I can see default value in main.py is 4096. What if I'll use a smaller values e.g. 1024 when quantizing MoE mixtral model? Will it affect the quality of quantized model? Or quality on greater than 1024 contexts? Will it significantly speedup process of quantization?
Thanks in advance!
parser.add_argument(
"--model_seqlen",
type=int,
default=4096,
help="Model seqlen and calibration data context length.",
)
Hi! It is recommended to use the seq_len the model you're quantizing was trained on (4096 for Llama-2, 8192 for mistral/mixtral). To reduce the number of samples, speeding up computations, you should decrease --nsamples instead.
However, it doesn't have that large impact on the quantization time anyway.
@BlackSamorez Thanks for the answer!
I am trying to quantize finetuned version of mixtral and I had no such long samples (8192) in the training set.
Then should I decrease max_epochs and finetune_max_epochs instead (in order to speedup the process)?
@VirtualRoyalty you may try and see how shorter sequences affect the quality. When I was tuning Mixtral, i used 7k instead of 8k to fit into memory and this seems to work fine. However, 1k is much shorter than 8k, so I cannot say apriori, whether it matters much.
@Godofnothing Thanks, good point!
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.