dspy icon indicating copy to clipboard operation
dspy copied to clipboard

How to convert a AutoModelForCausalLM object to a dspy model object?

Open pawanGithub10 opened this issue 9 months ago • 5 comments

import dspy

llm = dspy.HFModel(model='model')

This method takes a string as input for the model if i have a quantized model object of the class AutoModelForCausalLM How i can convert the model object to dspy object?

direct assignment gives error on inference

llm = model #previously created as AutoModelForCausalLM class object

llm("Testing testing, is anyone out there?")

Error After Code Line 4 File /opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:623, in LlamaModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict) 621 raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time") 622 elif input_ids is not None: --> 623 batch_size, seq_length = input_ids.shape 624 elif inputs_embeds is not None: 625 batch_size, seq_length, _ = inputs_embeds.shape

AttributeError: 'str' object has no attribute 'shape'

pawanGithub10 avatar May 13 '24 04:05 pawanGithub10

I see, but internally hf module internally uses AutoModel module to instantiate weights. So can you explain me why we require to use already loaded model to dspy instead of giving the weight path?

Anindyadeep avatar May 14 '24 05:05 Anindyadeep

I see, but internally hf module internally uses AutoModel module to instantiate weights. So can you explain me why we require to use already loaded model to dspy instead of giving the weight path?

Thanks for reply, The reason is I have a 4 bit quantized model and i want to use it directly. I have tried to save it first to hugging face so that i can load it from weight path but then error comes that the Hugging face does not support saving 4 bit quantized model.

pawanGithub10 avatar May 14 '24 06:05 pawanGithub10

can you please share the full code for the loading process and your approach? Would appreciate this.

Anindyadeep avatar May 14 '24 06:05 Anindyadeep

dspy_4bitquantized_llama2_error.zip I have attached the jupyter notebook. In this notebook when I am converting the quantized model then it searches for the config.json as I am giving the AutoModel variable please suggest some workaround or API call to use the quantized model.

pawanGithub10 avatar May 14 '24 08:05 pawanGithub10

Hey @pawanGithub10 I have started to raise a PR by seeing the issue that you faced. Here are some of the cases of loading models would look like:


from dsp.modules.hf_new import HFLocalModel
from transformers import AutoTokenizer, BitsAndBytesConfig 
from transformers import AutoModelForCausalLM

model_path = "../models/llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"


def case1():
    model = HFLocalModel(
        model=model_path,
        tokenizer=tokenizer, 
        load_in_4bit=True ,
        bnb_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4", 
            bnb_4bit_compute_dtype="float16", 
            bnb_4bit_use_double_quant=False
        )
    )

    response = model("hello", do_sample=True)
    print(response)


def case2():
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        quantization_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4", 
            bnb_4bit_compute_dtype="float16", 
            bnb_4bit_use_double_quant=False
        )
    )

    model_ = HFLocalModel(
        model=model,
        tokenizer=tokenizer, 
    )
    response = model_("hello", do_sample=True)
    print(response)

if __name__ == "__main__":
    case1()
    print("---------------------------")
    case2()

Additionally, adding PEFT models are also now supported, with multi gpu support. Now the problem is, I would be able to test till PEFT, for multi gpu support, the tests are not possible, since have no access with multi gpu setting.

Anindyadeep avatar May 20 '24 05:05 Anindyadeep