unsloth [Bug] Orpheus_tts espanish finetune ,cannot generate valid voice

changes:

model changed to canopylabs/3b-es_it-ft-research_release
max lenght : 3200
def redistribute_codes(code_list): if len(code_list) == 0: print("Warning: Empty code list, returning silence") return torch.zeros(1, 1, 24000) # 1秒的静音

layer_1 = [] layer_2 = [] layer_3 = []

for i in range(len(code_list) // 7): try: c0 = code_list[7i] c1 = code_list[7i+1] - 4096 c2 = code_list[7i+2] - (24096) c3 = code_list[7i+3] - (34096) c4 = code_list[7i+4] - (44096) c5 = code_list[7i+5] - (54096) c6 = code_list[7i+6] - (64096)
```
     # 检查范围并裁剪
     c0 = max(0, min(c0, 4095))
     c1 = max(0, min(c1, 4095))
     c2 = max(0, min(c2, 4095))
     c3 = max(0, min(c3, 4095))
     c4 = max(0, min(c4, 4095))
     c5 = max(0, min(c5, 4095))
     c6 = max(0, min(c6, 4095))

     layer_1.append(c0)
     layer_2.append(c1)
     layer_3.append(c2)
     layer_3.append(c3)
     layer_2.append(c4)
     layer_3.append(c5)
     layer_3.append(c6)

 except Exception as e:
     print(f"Error at frame {i}: {e}")
     continue
```
if len(layer_1) == 0: print("Warning: No valid codes decoded, returning silence") return torch.zeros(1, 1, 24000)

codes = [ torch.tensor(layer_1, dtype=torch.long).unsqueeze(0), torch.tensor(layer_2, dtype=torch.long).unsqueeze(0), torch.tensor(layer_3, dtype=torch.long).unsqueeze(0) ]

audio_hat = snac_model.decode(codes) return audio_hat

only generate silent audio

Oct 30 '25 12:10 yxk9810

@Etherll Unsure if you know anything about this

Nov 01 '25 12:11 danielhanchen

Can you share your setup? like notebook link I tried the official notebook and it worked fine. I set model_name = 'canopylabs/3b-es_it-ft-research_release and trained for one epoch using the ylacombe/google-argentinian-spanish dataset it already sounds pretty good to me

Here's the audio after fine-tuning for reference: https://vocaroo.com/15Kfj81345kI

I think the problem is due to your redistribute_codes function changes

Nov 01 '25 23:11 Etherll

Sorry for the late reply. this is the notebook i used, https://colab.research.google.com/drive/1sINXJCjZFPQDtUD9nYBdS3vRkKndLgsJ?usp=sharing The data i used can be downloaded from this link: https://drive.google.com/file/d/1rpMGSQLcdas9oM6xQ31rsvtp_KU7BxtF/view?usp=sharing I'm new to speech fine-tuning，would greatly appreciate it if you could help me.

Can you share your setup? like notebook link I tried the official notebook and it worked fine. I set model_name = 'canopylabs/3b-es_it-ft-research_release and trained for one epoch using the ylacombe/google-argentinian-spanish dataset it already sounds pretty good to me

Here's the audio after fine-tuning for reference: https://vocaroo.com/15Kfj81345kI

I think the problem is due to your redistribute_codes function changes

@Etherll

Nov 13 '25 08:11 yxk9810

I'm facing a strange problem. I trained the model on a custom dataset from a female speaker, but the output voice is male. My dataset was created by splitting a female speaker's audio into 10s clips + ASR. Trained for 20 steps. The training process completes without errors. I tested with the official female dataset, and it works correctly, producing a female voice. This suggests the issue is likely with my custom dataset. Could you provide any insights on what might be going wrong? Thanks. @Etherll , you can find my notebook https://colab.research.google.com/drive/1V0b2wza_SGx9hs26g-JNov_cufS5HIq6?usp=sharing

Nov 14 '25 07:11 yxk9810

unsloth unsloth copied to clipboard

[Bug] Orpheus_tts espanish finetune ,cannot generate valid voice

unsloth
unsloth copied to clipboard