dolly icon indicating copy to clipboard operation
dolly copied to clipboard

Is this correct approach to do Prompt Tuning on DollyV2 model

Open pratikchhapolika opened this issue 2 years ago • 2 comments

I am using this link to study about Prompt Tuning. Parameter-Efficient Fine-Tuning using 🤗 PEFT

Please let me know if its correct way to do Prompt tuning and saving the model.

It has 4 options. I am interested in option 3.

  1. Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning

Can I use Prompt Tuning to tune DollyV2 [databricks/dolly-v2-12b · Hugging Face] model.

The use-case is:

I have Context which has lot of paragraphs and then Question , the model has to answer the Question based on Context in a professional manner. Also can it classify the Question as relevant if answer is present in Context and irrelevant if answer is not in Context

The code that I have written is:

    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    num_virtual_tokens=31,
    prompt_tuning_init_text="Answer the question as truthfully as possible using and only using the provided context and if the answer is not contained within the context/text, say Irrelevant",
    tokenizer_name_or_path="dolly-v2-3b"
)
tokenizer = AutoTokenizer.from_pretrained("dolly-v2-3b")
model = AutoModelForCausalLM.from_pretrained("dolly-v2-3b",load_in_8bit=True,device_map='auto') #,load_in_8bit=True

model = get_peft_model(model, peft_config)

train_data = [
    {
        "Context": "How to Link Credit Card to ICICI Bank Account Step 1: Login to ICICIBank.com using your existing internet banking credentials. Step 2: Go to the 'Service Request' section. Step 3: Visit the 'Customer Service' option. Step 4: Select the Link Accounts/ Policy option to link your credit card to the existing user ID.",
        "Question": "How to add card?",
        "Answer": "Relevant. To add your card you can follow these steps: Step 1: Login to ICICIBank.com using your existing internet banking credentials. Step 2: Go to the 'Service Request' section. Step 3: Visit the 'Customer Service' option. Step 4: Select the Link Accounts/ Policy option to link your credit card to the existing user ID."
    },
    {
        "Context": "The python programming language is used in many different fields including web development, data analysis, artificial intelligence and scientific computing. It is a high-level language that is easy to learn and has a large community of users who can provide support and advice. ",
        "Question": "What is Python used for?",
        "Answer": "Relevant. Python is used in many different fields including web development, data analysis, artificial intelligence and scientific computing."
    },
    {
        "Context": "The United States is a federal republic consisting of 50 states, a federal district, five major self-governing territories, and various possessions. It has a population of over 330 million people and is the third most populous country in the world. The capital and largest city is Washington, D.C.",
        "Question": "What is the population of the United States?",
        "Answer": "Relevant. The United States has a population of over 330 million people."
    }]

Define a function to map examples to inputs and targets

def preprocess_function(examples):
    tokenized_examples = tokenizer(
        examples["Question"][0],
        examples["Context"][0],
        truncation=True,
        max_length=1024,
        padding="max_length"
    )
    tokenized_examples['labels']=tokenizer(
        examples["Answer"],
        truncation=True,
        max_length=1024,
        padding="max_length",
        return_tensors="pt")['input_ids'][0]
        
    return tokenized_examples

tokenized_train_data = [preprocess_function(example) for example in train_data]


class DemoDataset(Dataset):
    def __init__(self, data):
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sample = self.data[idx]
        
        item = {k: torch.tensor(v) for k, v in sample.items()}
        return item

dataset = DemoDataset(tokenized_train_data)

training_args = TrainingArguments(
    output_dir="results",
    learning_rate=1e-5,
    per_device_train_batch_size=1,
    num_train_epochs=10,
    weight_decay=0.01,
    logging_steps=1,
    save_steps=1,
    logging_dir="logs"
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    # data_collator=data_collator,
    tokenizer=tokenizer
)
trainer.train()

trainer.save_model("dolly3b_demo_model")

Inference

from peft import PeftModel, PeftConfig
tokenizer = AutoTokenizer.from_pretrained("dolly-v2-3b")
model = AutoModelForCausalLM.from_pretrained("dolly3b_demo_model")
model = get_peft_model(model, peft_config)

# Define example
context = "How to Link Credit Card to ICICI Bank Account Step 1: Login to ICICIBank.com using your existing internet banking credentials. Step 2: Go to the 'Service Request' section. Step 3: Visit the 'Customer Service' option. Step 4: Select the Link Accounts/ Policy option to link your credit card to the existing user ID."
question = "How to add card?"

# Encode inputs with prompt and tokenize
inputs = [f"{context} {question}"]
inputs_encoded = tokenizer(inputs, padding=True, truncation=True, max_length=1024, return_tensors="pt")
outputs = model.generate(input_ids=inputs_encoded["input_ids"], attention_mask=inputs_encoded["attention_mask"], max_new_tokens=200, eos_token_id=3)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

Is the inference and training code correct. ? The inference take lot of time to generate. and produces some gibberish output

pratikchhapolika avatar May 12 '23 05:05 pratikchhapolika

What GPU are you running on? eos_token looks wrong in your last snippet. Can you just use the provided pipeline? I don't think you need an attention mask either.

This model is already fine-tuned for that kind of response anyway. Do you need this?

srowen avatar May 12 '23 12:05 srowen

Here's an example of PEFT/LoRA. I'm implementing this myself now, and will report back if it works.

opyate avatar May 12 '23 13:05 opyate

In LoRA, you save the adapter. Then load the base model, then load the adapter after supplying the base model. See the docs and examples on HF.

LoRA does not give you a fine tuned model. It gives you a modification to a model.

On Sun, May 14, 2023, 5:51 AM Juan M Uys @.***> wrote:

I integrated PEFT/LoRa like so:

diff training/trainer.py training/trainer_peft.py

44a45,54> import torch> import torch.nn as nn> import bitsandbytes as bnb> import transformers> from transformers import AutoModel, AutoConfig, GPTJForCausalLM> > from peft import prepare_model_for_int8_training, LoraConfig, get_peft_model> > > 48a59,63> LORA_R = 4> LORA_ALPHA = 16> LORA_DROPOUT = 0.05> > 141a157,170> > # maybe enable this?> # model = prepare_model_for_int8_training(model, use_gradient_checkpointing=gradient_checkpointing)> > # this is new (from https://github.com/kw2828/Dolly-2.0-Series/blob/main/fine_tuning_dolly_v2_lora_alpaca.ipynb)> config = LoraConfig(> r=LORA_R,> lora_alpha=LORA_ALPHA,> lora_dropout=LORA_DROPOUT,> bias="none",> task_type="CAUSAL_LM",> )> model = get_peft_model(model, config)>

It trains, but doesn't generate a config.json, so I can't load the model for inference. One also can't just copy a config.json from a regular run, because the architectures are now different:

Some weights of the model checkpoint at /local_disk0/dolly_training/my_training_run were not used when initializing GPTNeoXForCausalLM: ...etc...

— Reply to this email directly, view it on GitHub https://github.com/databrickslabs/dolly/issues/158#issuecomment-1546869346, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGIZ6U7M5R4OFAZATBSXHTXGC2KZANCNFSM6AAAAAAX7ANR7M . You are receiving this because you commented.Message ID: @.***>

srowen avatar May 14 '23 13:05 srowen

(sorry, I deleted my comment, because I thought there might be more to do on my part, but thanks for the pointers!)

opyate avatar May 14 '23 13:05 opyate

Just FYI, it seems there's a command length limit. I've got too many CLI params now, and my app is getting junk input.

image

opyate avatar May 14 '23 20:05 opyate

Hm, what's this showing? I am not sure why the second didn't do shell escaping, but I think your token doesn't appear possibly because it was redacted? would be if it's stored as a secret. Is there more, like this causes an issue when used for real and it's definitely not shown?

srowen avatar May 14 '23 20:05 srowen

Yes, the {num_gpus_flag} is passed literally:

/local_disk0/.ephemeral_nfs/envs/pythonEnv-437fc41d-a20b-4579-bf72-9fbfcb02b73f/bin/python: can't open file '/Workspace/Repos/[email protected]/dolly/{num_gpus_flag}': [Errno 2] No such file or directory

I thought the command was just too long, but you're right: it's something to do with the token.

opyate avatar May 14 '23 22:05 opyate

I think it may be subtler than that, like, it isn't actually supporting multiple commands in a cell like you are trying to supply. You're actually passing one long command to bash, where the ! characters are causing it to reinsert the previous echo command. You can chain commands as in bash with &&, or try multiple cells. It's clearly working in the first instance

srowen avatar May 14 '23 23:05 srowen

What GPU are you running on? eos_token looks wrong in your last snippet. Can you just use the provided pipeline? I don't think you need an attention mask either.

This model is already fine-tuned for that kind of response anyway. Do you need this?

Hi @srowen , I am running on A100 1-GPU. What can be eos_token token. I can't remove the attention mask as it gives error after removing attention_mask. But still giving gibberish output.

question= "QUESTION :: How to unlock PayPal account? ANSWER :: "
inputs_encoded = tokenizer(question, padding=True, truncation=True, max_length=800, return_tensors="pt")

outputs = model.generate(input_ids=inputs_encoded["input_ids"], max_new_tokens=100)

RuntimeError: The size of tensor a (54) must match the size of tensor b (14) at non-singleton dimension 3

pratikchhapolika avatar May 15 '23 04:05 pratikchhapolika

trainer.train()

@srowen

I am doing Prompt-Tuning and training the model using Trainer API and then saving model in local using trainer.save_model("dolly3b_demo_model"). Do I need to upload in Hugging face? I do not have a pipeline. I save the model and then use that model to generate.

pratikchhapolika avatar May 15 '23 04:05 pratikchhapolika

I think it may be subtler than that, like, it isn't actually supporting multiple commands in a cell like you are trying to supply. You're actually passing one long command to bash, where the ! characters are causing it to reinsert the previous echo command. You can chain commands as in bash with &&, or try multiple cells. It's clearly working in the first instance

Hi @srowen , I'm continuing the CLI discussion over here: #165

opyate avatar May 15 '23 09:05 opyate

Hi @srowen , any help on this?

pratikchhapolika avatar May 17 '23 15:05 pratikchhapolika