CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

cosyvoice 3 train

Open 828Tina opened this issue 3 weeks ago • 9 comments

During the training process, many issues were discovered. Here, I would like to remind the author that there is a problem with cosyvoice3.yaml on github. We hope to replace it with the real parameter Settings in the model file.

I encountered the following problems during training. Could you please tell me how to solve them?

Root Cause (first observed failure):
[0]:
  time      : 2025-12-16_10:21:42
  host      : localhost
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1114294)
  error_file: /tmp/torchelastic_fd0l21ot/1986_u1j95bi_/attempt_0/0/error.json
  traceback : Traceback (most recent call last):
    File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
      return f(*args, **kwargs)
    File "/home/lxy/tts_project/cosyvoice3/CosyVoice/examples/libritts/cosyvoice3/cosyvoice/bin/train.py", line 190, in main
      executor.train_one_epoc(model, optimizer, scheduler, train_data_loader, cv_data_loader, writer, info_dict, scaler, group_join, ref_model=ref_model)
    File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/utils/executor.py", line 72, in train_one_epoc
      info_dict = batch_forward(model, batch_dict, scaler, info_dict, ref_model=self.ref_model, dpo_loss=self.dpo_loss)
    File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/utils/train_utils.py", line 255, in batch_forward
      info_dict['loss_dict'] = model(batch, device)
    File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1661, in forward
      else self._run_ddp_forward(*inputs, **kwargs)
    File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1487, in _run_ddp_forward
      return self.module(*inputs, **kwargs)  # type: ignore[index]
    File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/llm/llm.py", line 701, in forward
      acc = th_accuracy(logits.view(-1, self.speech_token_size + 3), lm_target, ignore_label=IGNORE_ID)
  RuntimeError: shape '[-1, 6564]' is invalid for input of size 8897476
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/lxy/tts_project/cosyvoice3/CosyVoice/examples/libritts/cosyvoice3/cosyvoice/bin/train.py", line 195, in <module>
[rank0]:     main()
[rank0]:   File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
[rank0]:     return f(*args, **kwargs)
[rank0]:   File "/home/lxy/tts_project/cosyvoice3/CosyVoice/examples/libritts/cosyvoice3/cosyvoice/bin/train.py", line 190, in main
[rank0]:     executor.train_one_epoc(model, optimizer, scheduler, train_data_loader, cv_data_loader, writer, info_dict, scaler, group_join, ref_model=ref_model)
[rank0]:   File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/utils/executor.py", line 72, in train_one_epoc
[rank0]:     info_dict = batch_forward(model, batch_dict, scaler, info_dict, ref_model=self.ref_model, dpo_loss=self.dpo_loss)
[rank0]:   File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/utils/train_utils.py", line 255, in batch_forward
[rank0]:     info_dict['loss_dict'] = model(batch, device)
[rank0]:   File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1661, in forward
[rank0]:     else self._run_ddp_forward(*inputs, **kwargs)
[rank0]:   File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1487, in _run_ddp_forward
[rank0]:     return self.module(*inputs, **kwargs)  # type: ignore[index]
[rank0]:   File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/lxy/tts_project/cosyvoice3/CosyVoice/cosyvoice/flow/flow.py", line 335, in forward
[rank0]:     h, h_lengths = self.encoder(token, token_len, streaming=streaming)
[rank0]:   File "/home/lxy/miniconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1964, in __getattr__
[rank0]:     raise AttributeError(
[rank0]: AttributeError: 'CausalMaskedDiffWithDiT' object has no attribute 'encoder'. Did you mean: 'decoder'?

828Tina avatar Dec 16 '25 02:12 828Tina

+3改成+200

wang-b22 avatar Dec 16 '25 06:12 wang-b22

flow训练也有问题,估计都没自己复现过train

wang-b22 avatar Dec 16 '25 07:12 wang-b22

flow训练也有问题,估计自己无法复现过train

self.encoder甚至都没有定义,虽然我估计跟原来的一样,但是这也太赶了吧,而且cosyvoice3.yaml的超参数直接copy的2

828Tina avatar Dec 16 '25 07:12 828Tina

是的,yaml要对照着模型重新改

Li-Dongcheng avatar Dec 16 '25 09:12 Li-Dongcheng

mark

JohnHerry avatar Dec 17 '25 02:12 JohnHerry

mark

LuffyGT avatar Dec 18 '25 06:12 LuffyGT

https://github.com/FunAudioLLM/CosyVoice/pull/1700 简单适配了一下,小数据跑了下感觉没问题

aaron-lii avatar Dec 18 '25 06:12 aaron-lii

flow3训练好以后麻烦看看,是否支持半精度量化推理呢?

JohnHerry avatar Dec 18 '25 07:12 JohnHerry

flow训练也有问题,估计自己无法复现过train

self.encoder甚至都没有定义,虽然我估计跟原来的一样,但是这也太赶了吧,而且cosyvoice3.yaml的超参数直接copy的2

gml12月11号开源,估计临时决定12.15的开源

ypatz avatar Dec 18 '25 07:12 ypatz