unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Num examples of SFTTrainer decreased to 4862 from 109955(original data)

Open skmanzg opened this issue 1 year ago • 5 comments

This is my trial for corpus training in unsloth. model load is the same as the example of unsloth code.

image

and then I changed r and alpha from default 16 to 64 and added dropout(0.1).

model = FastLanguageModel.get_peft_model(
    model,
    r = 64,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 64,
    lora_dropout = 0.1, 
    bias = "none",   
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,  
    loftq_config = None, 
)

data set(name = combined_dataset) consists of bunch of sentences as you see: print("Dataset structure:", combined_dataset)

image

and I used the same code from unsloth example accordingly(train_dataset, dataset_text_field)

EOS_TOKEN = tokenizer.eos_token

def formatting_func(example):
    return example["sentence"] + EOS_TOKEN

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    train_dataset = combined_dataset,
    dataset_text_field = "sentence",
    tokenizer = tokenizer,
    max_seq_length = max_seq_length,
    packing = True, 
    formatting_func = formatting_func,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_ratio = 0.03,
        max_grad_norm = 1.0,
        num_train_epochs = 1,
        learning_rate = 2e-5,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.1,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
    ),
)

and then when I train trainer_stats = trainer.train(), it shows the Num examples decreased.

image

but I did not noticed this fact and waited for the result.

8552.5999 seconds used for training.
142.54 minutes used for training.
Peak reserved memory = 11.16 GB.
Peak reserved memory for training = 4.801 GB.
Peak reserved memory % of max memory = 23.477 %.
Peak reserved memory for training % of max memory = 10.1 %.

This is the wandb result you might need.

image

image

I cannot clearly say model is well-trained when I try to infer as I intended. As soon as I noticed this num_examples decreased, I tried to re-run all code just in case. However, it shows the same decreased number(4862). Now I am not sure if I did wrong or it is bug or something.

skmanzg avatar May 24 '24 05:05 skmanzg

@skmanzg Yes packing = True essentially combines small and long sequences into 1 example, hence it decreases

danielhanchen avatar May 24 '24 10:05 danielhanchen

@skmanzg Yes packing = True essentially combines small and long sequences into 1 example, hence it decreases

Would it be OK to say It trained 109955 data then? One more question, can you link the source or explain how packing works in detail?

skmanzg avatar May 24 '24 23:05 skmanzg

@skmanzg https://huggingface.co/docs/trl/en/sft_trainer#packing-dataset--constantlengthdataset-

danielhanchen avatar May 25 '24 09:05 danielhanchen

I would turn it off to see if the results are better

danielhanchen avatar May 25 '24 09:05 danielhanchen

@danielhanchen This is the result without packing.

스크린샷 2024-05-28 075850 스크린샷 2024-05-28 081259

I had to reduce the size of LoRA and change parameters to avoid oscillate only status. Although It may look less stable than packing one, at least it used all data for each... What do you think of this?

skmanzg avatar May 27 '24 23:05 skmanzg

Yes looks fine to me!

danielhanchen avatar May 28 '24 14:05 danielhanchen

probs increase grad accumulation steps to smooth out the loss

danielhanchen avatar May 28 '24 14:05 danielhanchen

increasing grad might smooth out the lose? ok. thank you.

skmanzg avatar May 28 '24 23:05 skmanzg

Hmm probs not - i would just inc grad accum

danielhanchen avatar May 29 '24 06:05 danielhanchen

I am using packing = False still getting very less Num Examples :

Map (num_proc=15): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 198460/198460 [05:22<00:00, 615.72 examples/s] ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 210 | Num Epochs = 3 O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 78 "-____-" Number of trainable parameters = 20,766,720


trainer = SFTTrainer(
   model = model,
   tokenizer = tokenizer,
   train_dataset = dataset,
   formatting_func=format_instruction,
   max_seq_length = max_seq_length,
   dataset_num_proc = 15,
   packing = False, # Can make training 5x faster for short sequences.
   args = TrainingArguments(
       per_device_train_batch_size = 2,
       gradient_accumulation_steps = 4,
       num_train_epochs = 3,# Set this for 1 full training run.
        # num_train_epochs = 5
       save_strategy = "steps",
       save_steps = 0.05,
       learning_rate = 2e-4,
       fp16 = not is_bfloat16_supported(),
       # bf16 = is_bfloat16_supported(),
       bf16 = True,
       
       warmup_steps = 10,
       logging_steps = 20,
       
       optim = "adamw_8bit",
       weight_decay = 0.01,
       lr_scheduler_type = "linear",
       seed = 3407,
       output_dir = "/clearml_agent_cache/storage_manager/bhupendra_workdir/gemma-2-2b-fintune-dir/checkpoints_gemma2b-2-050824/",
   ),
)

bhupendrathore avatar Aug 05 '24 10:08 bhupendrathore

hey sorry, i fixed. the. problem was with my formatting function. it used to work with batch_size =1 with SFT directly trl.

new formatting function :


def formatting_prompts_func(examples):
    texts = []
    prompts = examples["prompt"]
    outputs = examples["selected_response"]
    
    for prompt,output in zip(prompts,outputs):
        text = f"""{prompt}\n\n{output}""" +  EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
    

problamatic : Old function :


 def format_instruction(sample):

    return [f"""{sample['prompt']}\n\n{sample['selected_response']}"""]

bhupendrathore avatar Aug 05 '24 13:08 bhupendrathore