autotrain-advanced icon indicating copy to clipboard operation
autotrain-advanced copied to clipboard

[BUG] Why does increasing model_max_length result in fine-tuning not working?

Open jackswl opened this issue 1 year ago • 5 comments

Prerequisites

  • [X] I have read the documentation.
  • [X] I have checked other issues for similar problems.

Backend

Local

Interface Used

CLI

CLI Command

--model_max_length 128
--block-size 128 \

and

--model_max_length 4096
--block-size 4096 \

UI Screenshots & Parameters

No response

Error Logs

When I am trying to test my fine-tuning if it works, I have an input 'Who is Bob?' and an output 'Bob is Jack's uncle's father's mother's granddaughter's husband'. I have 42 samples of this exact same input-output.

This works flawlessly and the model is able to overfit and output the exact response when I ask it. This is just to test that the fine-tuning works.

However, when I up the model_max_length to 4096, with everything else the same, the model is unable to recall anymore. Why is this happening? Does increasing block_size / model_max_length simply results in the model not learning/overfitting anymore? How do I prevent this?

@abhishekkrthakur some insights could be greatly appreciated.

Additional Information

No response

jackswl avatar Feb 17 '24 06:02 jackswl

@abhishekkrthakur able to give insight on this? seems like a major bug..

does it mean increasing the model_max_length (or block_size) while keeping the data length the same will affect the fine tuning process?

jackswl avatar Feb 18 '24 07:02 jackswl

@abhishekkrthakur sorry, any insights on this?

It seems like when I increase the block-size / model_max_length during fine-tuning to be much greater than the input token length, the model is not able to learn anymore from the fine-tuning (even though its severely overfitted).

jackswl avatar Feb 19 '24 03:02 jackswl

please be patient @jackshiwl . many times, immediate response is not possible :) if your sentences are small and you are using large max len, it means there will be too many padding tokens, which may account for the model not learning properly. given your data, you should choose the best hyperparameters suitable for the model you are training. this is not a bug.

abhishekkrthakur avatar Feb 19 '24 06:02 abhishekkrthakur

@abhishekkrthakur,

  1. for the padding args, I have it set as default, so it is 'none'. but it doesnt work if padding=right / padding = left.
  2. Even if there are paddings, I have tried to overfit it severely, by setting lots of epochs, etc. The loss goes down to abysmally small value, but it is still not able to recall the sample dataset (there is only 1 sample x42 times)
  3. i have begin testing from 1024, 2048, ... it all works. But once it hits 4096, it just totally stops recalling, even if I tried to increase the epochs drastically etc.

I am just worried if this is an issue if I use padding=none (default) in my finetuning process? because i have samples that are about 500 tokens, but some are also 4096 (all are trimmed to 4096 max). not sure if this will be a problem for fine-tuning. do you use padding for your own finetuning?

jackswl avatar Feb 19 '24 07:02 jackswl

would appreciate if you can elaborate a little on what padding sides do you use for your own finetuning - and also for inferencing.

jackswl avatar Mar 04 '24 12:03 jackswl

This issue is stale because it has been open for 15 days with no activity.

github-actions[bot] avatar Mar 24 '24 15:03 github-actions[bot]

This issue was closed because it has been inactive for 2 days since being marked as stale.

github-actions[bot] avatar Apr 03 '24 15:04 github-actions[bot]