cosyvoice 3 train
During the training process, many issues were discovered. Here, I would like to remind the author that there is a problem with cosyvoice3.yaml on github. We hope to replace it with the real parameter Settings in the model file.
I encountered the following problems during training. Could you please tell me how to solve them?
Root Cause (first observed failure):
[0]:
time : 2025-12-16_10:21:42
host : localhost
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1114294)
error_file: /tmp/torchelastic_fd0l21ot/1986_u1j95bi_/attempt_0/0/error.json
traceback : Traceback (most recent call last):
File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
return f(*args, **kwargs)
File "/home/lxy/tts_project/cosyvoice3/CosyVoice/examples/libritts/cosyvoice3/cosyvoice/bin/train.py", line 190, in main
executor.train_one_epoc(model, optimizer, scheduler, train_data_loader, cv_data_loader, writer, info_dict, scaler, group_join, ref_model=ref_model)
File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/utils/executor.py", line 72, in train_one_epoc
info_dict = batch_forward(model, batch_dict, scaler, info_dict, ref_model=self.ref_model, dpo_loss=self.dpo_loss)
File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/utils/train_utils.py", line 255, in batch_forward
info_dict['loss_dict'] = model(batch, device)
File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1661, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1487, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/llm/llm.py", line 701, in forward
acc = th_accuracy(logits.view(-1, self.speech_token_size + 3), lm_target, ignore_label=IGNORE_ID)
RuntimeError: shape '[-1, 6564]' is invalid for input of size 8897476
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/lxy/tts_project/cosyvoice3/CosyVoice/examples/libritts/cosyvoice3/cosyvoice/bin/train.py", line 195, in <module>
[rank0]: main()
[rank0]: File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
[rank0]: return f(*args, **kwargs)
[rank0]: File "/home/lxy/tts_project/cosyvoice3/CosyVoice/examples/libritts/cosyvoice3/cosyvoice/bin/train.py", line 190, in main
[rank0]: executor.train_one_epoc(model, optimizer, scheduler, train_data_loader, cv_data_loader, writer, info_dict, scaler, group_join, ref_model=ref_model)
[rank0]: File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/utils/executor.py", line 72, in train_one_epoc
[rank0]: info_dict = batch_forward(model, batch_dict, scaler, info_dict, ref_model=self.ref_model, dpo_loss=self.dpo_loss)
[rank0]: File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/utils/train_utils.py", line 255, in batch_forward
[rank0]: info_dict['loss_dict'] = model(batch, device)
[rank0]: File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1661, in forward
[rank0]: else self._run_ddp_forward(*inputs, **kwargs)
[rank0]: File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1487, in _run_ddp_forward
[rank0]: return self.module(*inputs, **kwargs) # type: ignore[index]
[rank0]: File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/flow/flow.py", line 335, in forward
[rank0]: h, h_lengths = self.encoder(token, token_len, streaming=streaming)
[rank0]: File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1964, in __getattr__
[rank0]: raise AttributeError(
[rank0]: AttributeError: 'CausalMaskedDiffWithDiT' object has no attribute 'encoder'. Did you mean: 'decoder'?
+3改成+200
flow训练也有问题,估计都没自己复现过train
flow训练也有问题,估计自己无法复现过train
self.encoder甚至都没有定义,虽然我估计跟原来的一样,但是这也太赶了吧,而且cosyvoice3.yaml的超参数直接copy的2
是的,yaml要对照着模型重新改
mark
mark
https://github.com/FunAudioLLM/CosyVoice/pull/1700 简单适配了一下,小数据跑了下感觉没问题
flow3训练好以后麻烦看看,是否支持半精度量化推理呢?
flow训练也有问题,估计自己无法复现过train
self.encoder甚至都没有定义,虽然我估计跟原来的一样,但是这也太赶了吧,而且cosyvoice3.yaml的超参数直接copy的2
gml12月11号开源,估计临时决定12.15的开源