localGPT
localGPT copied to clipboard
error with run of: TheBloke/Nous-Hermes-13B-GPTQ
Before running application I have:
- downloaded recent code from repo (main branch)
- re-instaled requirements.txt, just in a case ;-)
- made ingest.py, additional packages were downloaded including pytorch_model.bin
- I have run application with: run_localGPT.py (only change in code was name of model_id to: "TheBloke/Nous-Hermes-13B-GPTQ")
application finally fails with error message "OSError: TheBloke/Nous-Hermes-13B-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack."
seems like reason lays in: tf_model.h5, model.ckpt or flax_model.msgpack
Maybe you know the way how to fix it?
The current code doen't support quantized models but this is coming soon :)
Yes, sorry I have tested code from pull request #131 before merge, and on 7B GPTQ had successful run on my computer, but I forgot to remove this ticket. Model runs almost instant now. I think we can close the ticket :-)
May i know how to solve this issue, i run: model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ" model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized( model_name_or_path, model_basename=model_basename, use_safetensors=True, trust_remote_code=True, device=DEVICE, )
generation_config = GenerationConfig.from_pretrained(model_name_or_path)
But it returns :
FileNotFoundError Traceback (most recent call last) /home/tianyu/code/kefu/Get-Things-Done-with-Prompt-Engineering-and-LangChain/10.customer-support-chatbot-with-open-llm-and-langchain.ipynb Cell 19 line 6 2 model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order" 4 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) ----> 6 model = AutoGPTQForCausalLM.from_quantized( 7 model_name_or_path, 8 model_basename=model_basename, 9 use_safetensors=True, 10 trust_remote_code=True, 11 device=DEVICE, 12 ) 14 generation_config = GenerationConfig.from_pretrained(model_name_or_path)
File ~/miniconda3/envs/kefu/lib/python3.9/site-packages/auto_gptq/modeling/auto.py:108, in AutoGPTQForCausalLM.from_quantized(cls, model_name_or_path, device_map, max_memory, device, low_cpu_mem_usage, use_triton, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, trainable, disable_exllama, **kwargs) 102 # TODO: do we need this filtering of kwargs? @PanQiWei is there a reason we can't just pass all kwargs? 103 keywords = { 104 key: kwargs[key] 105 for key in list(signature(quant_func).parameters.keys()) + huggingface_kwargs 106 if key in kwargs 107 } --> 108 return quant_func( 109 model_name_or_path=model_name_or_path, 110 device_map=device_map, 111 max_memory=max_memory, ... --> 791 raise FileNotFoundError(f"Could not find model in {model_name_or_path}") 793 model_save_name = resolved_archive_file 795 if not disable_exllama and trainable:
FileNotFoundError: Could not find model in TheBloke/Nous-Hermes-13B-GPTQ