Austin
Austin
It's defaulting to CPU instead of GPU even though GPU is set. I'm looking into it. As an FYI, the 7B HF models need ~48 GB of memory.
@PromtEngineer It's because of the way I refactored the `from_pretrained` [method call on L89](https://github.com/teleprint-me/localGPT/blob/dev/localGPT/model.py#L89). ```sh 20:24:17 | ~/Documents/code/git/localGPT git:(dev | Δ) λ python -m localGPT.run 2023-06-28 20:24:28,557 - INFO -...
@PromtEngineer What arguments did you use? It errors out for me if I use device_type. The params are poorly defined and the source is deeply nested kwargs. It's turtles all...
I missed what you were originally stating with your reference to `load_huggingface_llama_model` and confused it with `load_huggingface_model`. I'll look into it.
@PromtEngineer It's the mock up code from the original run script: [auto#transformers.AutoModelForCausalLM.from_pretrained](https://huggingface.co/docs/transformers/v4.30.0/en/model_doc/auto#transformers.AutoModelForCausalLM.from_pretrained) ```py # run_localGPT.py def load_model(device_type, model_id, model_basename=None): # other source elif ( device_type.lower() == "cuda" ): # The...
It should work for HF models now: ```sh 00:03:02 | ~/Documents/code/git/localGPT git:(dev | θ) λ python -m localGPT.run 2023-06-29 00:03:09,757 - INFO - model.py:81 - Using AutoModelForCausalLM for full models...
@PromtEngineer Start here: https://github.com/huggingface/transformers/blob/v4.30.0/src/transformers/models/auto/auto_factory.py#L432 Just follow the references on your local machine starting from source if you want. That's how I did it. In most cases, you'll find that `cuda`...
I tend to stay away from apple hardware and software. I'm not much help there unfortunately. I do understand the basics though for mac os x, that's pretty much it....
@PromtEngineer @LeafmanZ Off to a great start! ```sh 03:06:16 | ~/Documents/code/git/localGPT git:(dev | Δ) λ python -m localGPT.ggml llama.cpp: loading model from MODELS/vicuna-7B-1.1-GPTQ-4bit-128g-GGML/vicuna-7B-1.1-GPTQ-4bit-128g.GGML.bin error loading model: unexpectedly reached end of...
Success! ```sh 03:48:38 | ~/Documents/code/git/localGPT git:(dev | Δ) λ python -m localGPT.ggml --low_vram True --text_input "Hello! What is your name?" llama.cpp: loading model from MODELS/orca_mini_7B-GGML/orca-mini-7b.ggmlv3.q4_0.bin llama_model_load_internal: format = ggjt v3...