lm-evaluation-harness
lm-evaluation-harness copied to clipboard
Support `quantization_config` argument on HF backend
With AutoAWQ, we can fuse layers causing a 2-3x speedup directly by passing a quantization_config. If this argument can be supported, it will be possible to evaluate quantized models at a much faster pace.
An example config:
quantization_config = AwqConfig(
bits=4,
fuse_max_seq_len=512,
do_fuse=True,
)
https://huggingface.co/docs/transformers/v4.36.1/en/quantization#fusing-modules-for-supported-architectures
We would be glad to support this feature!
It looks as though we should already support it out of the box, when quantization_config is in the model's HF config.json. (modulo potential issues arising due to us attempting to place the model onto a device manually?)
Regarding passing a quantization_config kwarg to from_pretrained(), we don't currently have a way to pass --model_args quantization_config=<nested dict of sub-values>, so some changes would need to be made to allow us to supply such a nested config via the CLI.
Another option would be to have a magic prefix s.t. any --model_args autogptq_* arg would be passed to init a GPTQConfig, or vice-versa for awq_* args going to AWQConfig. Given these configs can be doubly-nested though this seems annoying.
Would you be willing to test this functionality (as well as perhaps testing the full range of GPTQConfig values) and contribute a PR to the library? @casper-hansen
It looks as though we should already support it out of the box, when
quantization_configis in the model's HF config.json. (modulo potential issues arising due to us attempting to place the model onto a device manually?)
Although it is possible for the user to put this into the config, it requires extra steps. It would be much easier if it can be passed in programmatically because then I can add support for it in AutoAWQ.
Regarding passing a
quantization_configkwarg tofrom_pretrained(), we don't currently have a way to pass--model_args quantization_config=<nested dict of sub-values>, so some changes would need to be made to allow us to supply such a nested config via the CLI.Another option would be to have a magic prefix s.t. any
--model_args autogptq_*arg would be passed to init aGPTQConfig, or vice-versa forawq_*args going toAWQConfig. Given these configs can be doubly-nested though this seems annoying.Would you be willing to test this functionality (as well as perhaps testing the full range of
GPTQConfigvalues) and contribute a PR to the library? @casper-hansen
I am at capacity in terms of work on open source, so unfortunately I do not have time to implement this functionality myself but would be happy to test it when support is provided.
I am at capacity in terms of work on open source, so unfortunately I do not have time to implement this functionality myself but would be happy to test it when support is provided.
Thanks nevertheless for raising this issue, it's much appreciated!
I will see about adding support for this, via the route of allowing for --model_args arg1=<string dict of values we'll call json.loads on>,arg2=.... though I may not prioritize it.
If any other contributors would like to help out, please don't hesitate to comment or assign yourself!
Hi @haileyschoelkopf
Are you actively working on this issue?
Please let me know, If I could contribute my part to this issue
I’m not, @mahimairaja go for it!
The places to adapt are in lm_eval.models.huggingface.HFLM and lm_eval.utils.simple_parse_args_string() (to add the json.loads triggering on {} characters in an arg’s value string) respectively.
Thanks Hailey, Looking forward!
Hi @mahimairaja , how is this going? do you need any help with it?
HF has a new method for adding quantization methods, which any extended quantization integration should look at and support: https://huggingface.co/docs/transformers/main/en/hf_quantizer
Actually, because we already support passing arbitrary keyword arguments to AutoModelForCausalLM.from_pretrained(), this is already supported.
You can therefore use the library programmatically with any quantization_config initialized or defined as a dict and then passed into HFLM's init, similar to the example in https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md#external-library-usage !
Separate from this, will consider whether or how we want to allow users to pass nested configs through the CLI. Tracking this in #1366 .
At a certain level of complexity, simply using a Python script may make more sense.