南栖
南栖
I tried it again and it worked,  code: from transformers import AutoTokenizer, TextGenerationPipeline from transformers import LlamaForCausalLM,LlamaTokenizer from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig import logging logging.basicConfig( format="%(asctime)s %(levelname)s [%(name)s] %(message)s",...
It's ok in quant.py weights = (self.scales[self.g_idx.long()] * (weight - zeros[self.g_idx.long()]))
+1,same mistake.
切换transformers的版本,例如4.37.2
It's available at this branch:https://github.com/Minami-su/attention_sinks_autogptq @synacktraa
And then I try this:pip install git+https://github.com/tomaarsen/attention_sinks.git@model/qwen_fa error happen: ``` The repository for Qwen-7B-Chat2 contains custom code which must be executed to correctlyload the model. You can inspect the repository...