dspy How to convert a AutoModelForCausalLM object to a dspy model object?

import dspy

llm = dspy.HFModel(model='model')

This method takes a string as input for the model if i have a quantized model object of the class AutoModelForCausalLM How i can convert the model object to dspy object?

direct assignment gives error on inference

llm = model #previously created as AutoModelForCausalLM class object

llm("Testing testing, is anyone out there?")

Error After Code Line 4 File /opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:623, in LlamaModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict) 621 raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time") 622 elif input_ids is not None: --> 623 batch_size, seq_length = input_ids.shape 624 elif inputs_embeds is not None: 625 batch_size, seq_length, _ = inputs_embeds.shape

AttributeError: 'str' object has no attribute 'shape'

May 13 '24 04:05 pawanGithub10

I see, but internally hf module internally uses AutoModel module to instantiate weights. So can you explain me why we require to use already loaded model to dspy instead of giving the weight path?

May 14 '24 05:05 Anindyadeep

I see, but internally hf module internally uses AutoModel module to instantiate weights. So can you explain me why we require to use already loaded model to dspy instead of giving the weight path?

Thanks for reply, The reason is I have a 4 bit quantized model and i want to use it directly. I have tried to save it first to hugging face so that i can load it from weight path but then error comes that the Hugging face does not support saving 4 bit quantized model.

May 14 '24 06:05 pawanGithub10

can you please share the full code for the loading process and your approach? Would appreciate this.

May 14 '24 06:05 Anindyadeep

dspy_4bitquantized_llama2_error.zip I have attached the jupyter notebook. In this notebook when I am converting the quantized model then it searches for the config.json as I am giving the AutoModel variable please suggest some workaround or API call to use the quantized model.

May 14 '24 08:05 pawanGithub10

Hey @pawanGithub10 I have started to raise a PR by seeing the issue that you faced. Here are some of the cases of loading models would look like:


from dsp.modules.hf_new import HFLocalModel
from transformers import AutoTokenizer, BitsAndBytesConfig 
from transformers import AutoModelForCausalLM

model_path = "../models/llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"


def case1():
    model = HFLocalModel(
        model=model_path,
        tokenizer=tokenizer, 
        load_in_4bit=True ,
        bnb_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4", 
            bnb_4bit_compute_dtype="float16", 
            bnb_4bit_use_double_quant=False
        )
    )

    response = model("hello", do_sample=True)
    print(response)


def case2():
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        quantization_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4", 
            bnb_4bit_compute_dtype="float16", 
            bnb_4bit_use_double_quant=False
        )
    )

    model_ = HFLocalModel(
        model=model,
        tokenizer=tokenizer, 
    )
    response = model_("hello", do_sample=True)
    print(response)

if __name__ == "__main__":
    case1()
    print("---------------------------")
    case2()

Additionally, adding PEFT models are also now supported, with multi gpu support. Now the problem is, I would be able to test till PEFT, for multi gpu support, the tests are not possible, since have no access with multi gpu setting.

May 20 '24 05:05 Anindyadeep

@Anindyadeep thanks a lot for the detailed help but i feel that i have missed the documentation details.

nit signature: dspy.HFModel( model: str, checkpoint: Optional[str] = None, is_client: bool = False, hf_device_map: Literal['auto', 'balanced', 'balanced_low_0', 'sequential'] = 'auto', token: Optional[str] = None, model_kwargs: Optional[dict] = {}, ) Docstring: Abstract class for language models. Init docstring:

Args: model (str): HF model identifier to load and use checkpoint (str, optional): load specific checkpoints of the model. Defaults to None. is_client (bool, optional): whether to access models via client. Defaults to False. hf_device_map (str, optional): HF config strategy to load the model. Recommeded to use "auto", which will help loading large models using accelerate. Defaults to "auto". model_kwargs (dict, optional): additional kwargs to pass to the model constructor. Defaults to empty dict. File: /opt/conda/lib/python3.11/site-packages/dsp/modules/hf.py Type: ABCMeta Subclasses: HFClientTGI, HFClientVLLM, Together, Anyscale, ChatModuleClient, HFClientSGLang

So after reading this i have made following changes and the code work

import dspy model_specific_param = {"torch_dtype": torch.float16,'quantization_config':bnb_config} model_name = '/tmp/models/llama2/7b' llm = dspy.HFModel(model=model_name,model_kwargs = model_specific_param)

May 22 '24 04:05 pawanGithub10

As per the previous comment i think the issue can be closed

May 22 '24 04:05 pawanGithub10