fish-speech [BUG] Normal LoRa finetune: Error(s) in loading state

Feel free to ask any kind of questions in the issues page, but please use English since other users may find your questions valuable.

Describe the bug I strictly followed the docs but I am unable to finetune my model. Right before beginning training, it throws:

[2024-08-15 03:41:49,776][fish_speech.models.text2semantic.llama][INFO] - [rank: 0] Loaded weights with error: _IncompatibleKeys(missing_keys=['embeddings.lora_A', 'embeddings.lora_B', 'codebook_embeddings.lora_A', 'codebook_embeddings.lora_B', 'layers.0.attention.wqkv.lora_A', 'layers.0.attention.wqkv.lora_B', 'layers.0.attention.wo.lora_A',

[...]

'fast_layers.3.feed_forward.w2.lora_B', 'fast_output.lora_A', 'fast_output.lora_B'], unexpected_keys=[])

To Reproduce Create new conda env, follow the docs for finetuning

Expected behavior It should recognize all keys and train correctly

Screenshots / log If applicable, add screenshots / logs to help explain your problem.

Additional context The resulting model produces 0.2s of audio, but it's junk. The same error occurs when wanting to merge, it almost throws every layer as not recognized

Aug 15 '24 17:08 SinanAkkoyun

Fixed

Sep 14 '24 15:09 AnyaCoder

Thanks!

Sep 15 '24 13:09 SinanAkkoyun

It's happening again, with the latest docker image.

Nov 19 '24 00:11 SinanAkkoyun

This issue is stale because it has been open for 30 days with no activity.

Dec 20 '24 00:12 github-actions[bot]

same issue

Dec 23 '24 07:12 macklinhrw

Can someone please take a look at this? Otherwise lora finetuning isn't possible...

Dec 23 '24 07:12 SinanAkkoyun

@SinanAkkoyun Have you been successful in non-LoRA finetuning?

I've been experimenting, after seeing #772 I tried to do full-finetuning (by removing the +lora part of the finetune command) and copied some of the extra config options that were used by the poster. The LoRA warnings went away after doing this. But I was also getting 0/nan loss and 0 accuracy... After testing I discovered that there was an issue with my dataset. I was trying to do a fine-tune just with some messy data to see how it did (this is my first time) and I had 3 ~1hr audio files. However, after being unsuccessful, I read some other replies from contributors suggesting ~30s clips, so I tried this and it seems to have fixed my issue.

I'm currently running a full finetune and the loss is steadily going down. I think there should be some documentation on data-preparation and practices so others with less experience finetuning TTS models like me will not accidentally waste time. Also some documentation mentioning full-finetuning versus lora fine-tuning (there is only documentation on lora fine-tuning?) would be helpful, even if its just saying that its worse (maybe there is a reason they haven't documented it?).

Dec 23 '24 17:12 macklinhrw

This issue is stale because it has been open for 30 days with no activity.

Jan 26 '25 00:01 github-actions[bot]

@macklinhrw Thanks for your input, however I am aiming at doing a LoRa finetune. Full FT works for me

Jan 26 '25 00:01 SinanAkkoyun

I have been trying to train a lora with ~1500 individual 10-1 second audio files, and I still get the NAN error on the latest master.

Feb 16 '25 16:02 chrismuzyn

@Whale-Dolphin Was this fixed?

Mar 20 '25 11:03 SinanAkkoyun

Yes, we've fixed it, if it works with some warnings, just ignore them.

Mar 20 '25 11:03 Whale-Dolphin

[BUG] Normal LoRa finetune: Error(s) in loading state_dict for TextToSemantic