po13on

Results 2 comments of po13on

When I load the model in 4-bit and set model.cfg.use_split_qkv_input = True, this bug will be triggered. **Code Example:** ```python model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir=cache_dir, proxies=proxies,local_files_only=False, low_cpu_mem_usage=True, use_safetensors=False, load_in_4bit=True, torch_dtype=torch.float32, )...

@bryce13950 I'm sorry for providing incomplete code. The model I loaded is vicuna-7b. Below is the complete code ```python model_name = 'lmsys/vicuna-7b-v1.3' model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir=cache_dir, proxies=proxies,local_files_only=False, low_cpu_mem_usage=True, use_safetensors=False, load_in_4bit=True,...