autotrain-advanced
autotrain-advanced copied to clipboard
[BUG] Why does increasing model_max_length result in fine-tuning not working?
Prerequisites
- [X] I have read the documentation.
- [X] I have checked other issues for similar problems.
Backend
Local
Interface Used
CLI
CLI Command
--model_max_length 128
--block-size 128 \
and
--model_max_length 4096
--block-size 4096 \
UI Screenshots & Parameters
No response
Error Logs
When I am trying to test my fine-tuning if it works, I have an input 'Who is Bob?' and an output 'Bob is Jack's uncle's father's mother's granddaughter's husband'. I have 42 samples of this exact same input-output.
This works flawlessly and the model is able to overfit and output the exact response when I ask it. This is just to test that the fine-tuning works.
However, when I up the model_max_length to 4096, with everything else the same, the model is unable to recall anymore. Why is this happening? Does increasing block_size / model_max_length simply results in the model not learning/overfitting anymore? How do I prevent this?
@abhishekkrthakur some insights could be greatly appreciated.
Additional Information
No response
@abhishekkrthakur able to give insight on this? seems like a major bug..
does it mean increasing the model_max_length (or block_size) while keeping the data length the same will affect the fine tuning process?
@abhishekkrthakur sorry, any insights on this?
It seems like when I increase the block-size / model_max_length during fine-tuning to be much greater than the input token length, the model is not able to learn anymore from the fine-tuning (even though its severely overfitted).
please be patient @jackshiwl . many times, immediate response is not possible :) if your sentences are small and you are using large max len, it means there will be too many padding tokens, which may account for the model not learning properly. given your data, you should choose the best hyperparameters suitable for the model you are training. this is not a bug.
@abhishekkrthakur,
- for the padding args, I have it set as default, so it is 'none'. but it doesnt work if padding=right / padding = left.
- Even if there are paddings, I have tried to overfit it severely, by setting lots of epochs, etc. The loss goes down to abysmally small value, but it is still not able to recall the sample dataset (there is only 1 sample x42 times)
- i have begin testing from 1024, 2048, ... it all works. But once it hits 4096, it just totally stops recalling, even if I tried to increase the epochs drastically etc.
I am just worried if this is an issue if I use padding=none (default) in my finetuning process? because i have samples that are about 500 tokens, but some are also 4096 (all are trimmed to 4096 max). not sure if this will be a problem for fine-tuning. do you use padding for your own finetuning?
would appreciate if you can elaborate a little on what padding sides do you use for your own finetuning - and also for inferencing.
This issue is stale because it has been open for 15 days with no activity.
This issue was closed because it has been inactive for 2 days since being marked as stale.