fish-speech Fine-tuning result gradually becoming noise

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

yes, when I use fine-tuned model, result gradually becoming noise

2. Additional context or comments

No response

3. Can you help us with this feature?

[X] I am interested in contributing to this feature.

Sep 23 '24 14:09 knabx

Can you share more information? Like how much data you use, the batch size, learning rate, and the step you train

Sep 23 '24 15:09 Stardust-minus

Can you share more information? Like how much data you use, the batch size, learning rate, and the step you train

Thanks for your reply. I used 61 wav files, with an average length of about 3 minutes each. The batch size: 2, and accumulate_grad_batches: 4, learning rate: 1e-4, other parameters are the same as document. I trained for 1000 steps, saving checkpoint every 100 steps and validating the results . The loss gradually decreased, and the top 5 accuracy gradually increased, eventually reaching around 0.95. However, the generated audio by the checkpoints after 300 steps gradually declined, eventually become noise.

Sep 23 '24 22:09 knabx

Can you share more information? Like how much data you use, the batch size, learning rate, and the step you train

Thanks for your reply. I used 61 wav files, with an average length of about 3 minutes each. The batch size: 2, and accumulate_grad_batches: 4, learning rate: 1e-4, other parameters are the same as document. I trained for 1000 steps, saving checkpoint every 100 steps and validating the results . The loss gradually decreased, and the top 5 accuracy gradually increased, eventually reaching around 0.95. However, the generated audio by the checkpoints after 300 steps gradually declined, eventually become noise.

Your learning rate is a bit too high, try 1e-5 to 5e-5. Also, the LLaMa part doesn't need much steps, about 100-300 is OK.

Sep 24 '24 02:09 PoTaTo-Mika

I have changed learning rate to 1e-5 and trained 300 steps, but loss and top 5 accuracy listed below. Does this look good? or should I train for more steps / increase learning rate? train/loss=4.590, train/top_5_accuracy=0.350, val/loss=4.750, val/top_5_accuracy=0.354

BTW, there seems not much difference between audio generated by fine-tuned model and original model

Sep 24 '24 13:09 knabx

This issue is stale because it has been open for 30 days with no activity.

Oct 25 '24 00:10 github-actions[bot]

I have changed learning rate to 1e-5 and trained 300 steps, but loss and top 5 accuracy listed below. Does this look good? or should I train for more steps / increase learning rate? train/loss=4.590, train/top_5_accuracy=0.350, val/loss=4.750, val/top_5_accuracy=0.354

BTW, there seems not much difference between audio generated by fine-tuned model and original model

should be better finetuning in next release

Oct 26 '24 02:10 AnyaCoder

Hey there

I have been finetuning the v1.5 model on english dataset just to see how it behaves and my loss curve doesnt go below 8. Any reasons why it is happening? Batch size = 2 Train data - 132 hours lora rank - 8 grad_accum- 1

Any reason for this behaviour?

Jan 05 '25 13:01 abhisirka2001

Hey there

I have been finetuning the v1.5 model on english dataset just to see how it behaves and my loss curve doesnt go below 8. Any reasons why it is happening? Batch size = 2 Train data - 132 hours lora rank - 8 grad_accum- 1

Any reason for this behaviour?

The loss seems to be reasonable, just go on with your finetune.

Jan 05 '25 13:01 PoTaTo-Mika

Are the loss curves going fairly well? I do not what should be the minimum loss when the generated samples would sound well.

Jan 13 '25 08:01 abhisirka2001