unsloth Num examples of SFTTrainer decreased to 4862 from 109955(original data)

This is my trial for corpus training in unsloth. model load is the same as the example of unsloth code.

and then I changed r and alpha from default 16 to 64 and added dropout(0.1).

model = FastLanguageModel.get_peft_model(
    model,
    r = 64,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 64,
    lora_dropout = 0.1, 
    bias = "none",   
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,  
    loftq_config = None, 
)

data set(name = combined_dataset) consists of bunch of sentences as you see: print("Dataset structure:", combined_dataset)

and I used the same code from unsloth example accordingly(train_dataset, dataset_text_field)

EOS_TOKEN = tokenizer.eos_token

def formatting_func(example):
    return example["sentence"] + EOS_TOKEN

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    train_dataset = combined_dataset,
    dataset_text_field = "sentence",
    tokenizer = tokenizer,
    max_seq_length = max_seq_length,
    packing = True, 
    formatting_func = formatting_func,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_ratio = 0.03,
        max_grad_norm = 1.0,
        num_train_epochs = 1,
        learning_rate = 2e-5,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.1,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
    ),
)

and then when I train trainer_stats = trainer.train(), it shows the Num examples decreased.

but I did not noticed this fact and waited for the result.

8552.5999 seconds used for training.
142.54 minutes used for training.
Peak reserved memory = 11.16 GB.
Peak reserved memory for training = 4.801 GB.
Peak reserved memory % of max memory = 23.477 %.
Peak reserved memory for training % of max memory = 10.1 %.

This is the wandb result you might need.

I cannot clearly say model is well-trained when I try to infer as I intended. As soon as I noticed this num_examples decreased, I tried to re-run all code just in case. However, it shows the same decreased number(4862). Now I am not sure if I did wrong or it is bug or something.

May 24 '24 05:05 skmanzg

@skmanzg Yes packing = True essentially combines small and long sequences into 1 example, hence it decreases

May 24 '24 10:05 danielhanchen

@skmanzg Yes packing = True essentially combines small and long sequences into 1 example, hence it decreases

Would it be OK to say It trained 109955 data then? One more question, can you link the source or explain how packing works in detail?

May 24 '24 23:05 skmanzg

@skmanzg https://huggingface.co/docs/trl/en/sft_trainer#packing-dataset--constantlengthdataset-

May 25 '24 09:05 danielhanchen

I would turn it off to see if the results are better

May 25 '24 09:05 danielhanchen

@danielhanchen This is the result without packing.

스크린샷 2024-05-28 075850 스크린샷 2024-05-28 081259

I had to reduce the size of LoRA and change parameters to avoid oscillate only status. Although It may look less stable than packing one, at least it used all data for each... What do you think of this?

May 27 '24 23:05 skmanzg

Yes looks fine to me!

May 28 '24 14:05 danielhanchen

probs increase grad accumulation steps to smooth out the loss

May 28 '24 14:05 danielhanchen

increasing grad might smooth out the lose? ok. thank you.

May 28 '24 23:05 skmanzg

Hmm probs not - i would just inc grad accum

May 29 '24 06:05 danielhanchen

I am using packing = False still getting very less Num Examples :

Map (num_proc=15): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 198460/198460 [05:22<00:00, 615.72 examples/s] ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 210 | Num Epochs = 3 O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 78 "-____-" Number of trainable parameters = 20,766,720


trainer = SFTTrainer(
   model = model,
   tokenizer = tokenizer,
   train_dataset = dataset,
   formatting_func=format_instruction,
   max_seq_length = max_seq_length,
   dataset_num_proc = 15,
   packing = False, # Can make training 5x faster for short sequences.
   args = TrainingArguments(
       per_device_train_batch_size = 2,
       gradient_accumulation_steps = 4,
       num_train_epochs = 3,# Set this for 1 full training run.
        # num_train_epochs = 5
       save_strategy = "steps",
       save_steps = 0.05,
       learning_rate = 2e-4,
       fp16 = not is_bfloat16_supported(),
       # bf16 = is_bfloat16_supported(),
       bf16 = True,
       
       warmup_steps = 10,
       logging_steps = 20,
       
       optim = "adamw_8bit",
       weight_decay = 0.01,
       lr_scheduler_type = "linear",
       seed = 3407,
       output_dir = "/clearml_agent_cache/storage_manager/bhupendra_workdir/gemma-2-2b-fintune-dir/checkpoints_gemma2b-2-050824/",
   ),
)

Aug 05 '24 10:08 bhupendrathore

hey sorry, i fixed. the. problem was with my formatting function. it used to work with batch_size =1 with SFT directly trl.

new formatting function :


def formatting_prompts_func(examples):
    texts = []
    prompts = examples["prompt"]
    outputs = examples["selected_response"]
    
    for prompt,output in zip(prompts,outputs):
        text = f"""{prompt}\n\n{output}""" +  EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

problamatic : Old function :


 def format_instruction(sample):

    return [f"""{sample['prompt']}\n\n{sample['selected_response']}"""]

Aug 05 '24 13:08 bhupendrathore

unsloth unsloth copied to clipboard

Num examples of SFTTrainer decreased to 4862 from 109955(original data)

unsloth
unsloth copied to clipboard