trafficstars

Hello everyone,

Lets say i want to use - TheBloke/Llama-2-Coder-7B-GPTQ That model has a few branches such as main, gptq-4bit-32g-actorder_True, gptq-8bit--1g-actorder_True etc, if you scroll down.

eg : https://huggingface.co/TheBloke/Llama-2-Coder-7B-GPTQ vs https://huggingface.co/TheBloke/Llama-2-Coder-7B-GPTQ/tree/gptq-8bit--1g-actorder_True

every branch has its own 'model.safetensors', how do make localGPT use one of the branches and not 'main' as default ?

Thanks in advance and sorry if this information is already somewhere, i couldnt find it.

Sep 14 '23 06:09 N1h1lv5

anyone ? from https://huggingface.co/TheBloke/Llama-2-13B-LoRA-Assemble-GPTQ

`model_name_or_path = "TheBloke/Llama-2-13B-LoRA-Assemble-GPTQ"

To use a different branch, change revision

For example: revision="main"

model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", trust_remote_code=False, revision="main")`

I tried to implement revision but it doesnt work...

Sep 14 '23 17:09 N1h1lv5

-removed- using now gguf models instead of GPTQ. sometimes getting empty answers.

Question: what is in the text ?

Answer:

Enter a query:`

I ask myself if the quantized models are working differently than the rest, which the code is still not suitable for those ?

Sep 17 '23 15:09 N1h1lv5

@N1h1lv5 I am running into the same issue, empty answers. Still debugging it. It might be related to the promptTemplate. Will debug.

Sep 18 '23 07:09 PromtEngineer

@N1h1lv5 I am running into the same issue, empty answers. Still debugging it. It might be related to the promptTemplate. Will debug.

You are the best ! thanks !

Sep 18 '23 07:09 N1h1lv5

Has anyone found a workaround for this? There are a few models I'd like to try but I need to use a branch other than 'main' due to VRAM constraints. Everything I've attempted so far either downloads the main branch or fails to find the specified model.

Jan 29 '24 11:01 Calamaroo

Has anyone found a workaround for this? There are a few models I'd like to try but I need to use a branch other than 'main' due to VRAM constraints. Everything I've attempted so far either downloads the main branch or fails to find the specified model.

I am wondering this as well. One way would be is to use huggingface cli to download the model manually.

You need to modify the code to use the revision parameter, as below. I am going to implement this locally, will add REVISION to MODEL_ID and MODEL_BASENAME.

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, revision="gptq-4bit-32g-actorder_True", model_basename=model_basename, use_safetensors=True, trust_remote_code=True, device="cuda:0", quantize_config=None)

Mar 04 '24 01:03 daniellefisla

Created a PR to support model branches: https://github.com/PromtEngineer/localGPT/pull/765 @PromtEngineer

Mar 05 '24 04:03 daniellefisla

localGPT
localGPT copied to clipboard

How to use branches models ?

To use a different branch, change revision

For example: revision="main"

localGPT localGPT copied to clipboard

How to use branches models ?

To use a different branch, change revision

For example: revision="main"

localGPT
localGPT copied to clipboard