[BUG] Normal LoRa finetune: Error(s) in loading state_dict for TextToSemantic
Feel free to ask any kind of questions in the issues page, but please use English since other users may find your questions valuable.
Describe the bug I strictly followed the docs but I am unable to finetune my model. Right before beginning training, it throws:
[2024-08-15 03:41:49,776][fish_speech.models.text2semantic.llama][INFO] - [rank: 0] Loaded weights with error: _IncompatibleKeys(missing_keys=['embeddings.lora_A', 'embeddings.lora_B', 'codebook_embeddings.lora_A', 'codebook_embeddings.lora_B', 'layers.0.attention.wqkv.lora_A', 'layers.0.attention.wqkv.lora_B', 'layers.0.attention.wo.lora_A',
[...]
'fast_layers.3.feed_forward.w2.lora_B', 'fast_output.lora_A', 'fast_output.lora_B'], unexpected_keys=[])
To Reproduce Create new conda env, follow the docs for finetuning
Expected behavior It should recognize all keys and train correctly
Screenshots / log If applicable, add screenshots / logs to help explain your problem.
Additional context The resulting model produces 0.2s of audio, but it's junk. The same error occurs when wanting to merge, it almost throws every layer as not recognized
Fixed
Thanks!
It's happening again, with the latest docker image.
This issue is stale because it has been open for 30 days with no activity.
same issue
Can someone please take a look at this? Otherwise lora finetuning isn't possible...
@SinanAkkoyun Have you been successful in non-LoRA finetuning?
I've been experimenting, after seeing #772 I tried to do full-finetuning (by removing the +lora part of the finetune command) and copied some of the extra config options that were used by the poster. The LoRA warnings went away after doing this. But I was also getting 0/nan loss and 0 accuracy... After testing I discovered that there was an issue with my dataset. I was trying to do a fine-tune just with some messy data to see how it did (this is my first time) and I had 3 ~1hr audio files. However, after being unsuccessful, I read some other replies from contributors suggesting ~30s clips, so I tried this and it seems to have fixed my issue.
I'm currently running a full finetune and the loss is steadily going down. I think there should be some documentation on data-preparation and practices so others with less experience finetuning TTS models like me will not accidentally waste time. Also some documentation mentioning full-finetuning versus lora fine-tuning (there is only documentation on lora fine-tuning?) would be helpful, even if its just saying that its worse (maybe there is a reason they haven't documented it?).
This issue is stale because it has been open for 30 days with no activity.
@macklinhrw Thanks for your input, however I am aiming at doing a LoRa finetune. Full FT works for me
I have been trying to train a lora with ~1500 individual 10-1 second audio files, and I still get the NAN error on the latest master.
@Whale-Dolphin Was this fixed?
Yes, we've fixed it, if it works with some warnings, just ignore them.