unsloth
unsloth copied to clipboard
[Bug] Orpheus_tts espanish finetune ,cannot generate valid voice
changes:
-
model changed to canopylabs/3b-es_it-ft-research_release
-
max lenght : 3200
-
def redistribute_codes(code_list): if len(code_list) == 0: print("Warning: Empty code list, returning silence") return torch.zeros(1, 1, 24000) # 1秒的静音
layer_1 = [] layer_2 = [] layer_3 = []
for i in range(len(code_list) // 7): try: c0 = code_list[7i] c1 = code_list[7i+1] - 4096 c2 = code_list[7i+2] - (24096) c3 = code_list[7i+3] - (34096) c4 = code_list[7i+4] - (44096) c5 = code_list[7i+5] - (54096) c6 = code_list[7i+6] - (64096)
# 检查范围并裁剪 c0 = max(0, min(c0, 4095)) c1 = max(0, min(c1, 4095)) c2 = max(0, min(c2, 4095)) c3 = max(0, min(c3, 4095)) c4 = max(0, min(c4, 4095)) c5 = max(0, min(c5, 4095)) c6 = max(0, min(c6, 4095)) layer_1.append(c0) layer_2.append(c1) layer_3.append(c2) layer_3.append(c3) layer_2.append(c4) layer_3.append(c5) layer_3.append(c6) except Exception as e: print(f"Error at frame {i}: {e}") continueif len(layer_1) == 0: print("Warning: No valid codes decoded, returning silence") return torch.zeros(1, 1, 24000)
codes = [ torch.tensor(layer_1, dtype=torch.long).unsqueeze(0), torch.tensor(layer_2, dtype=torch.long).unsqueeze(0), torch.tensor(layer_3, dtype=torch.long).unsqueeze(0) ]
audio_hat = snac_model.decode(codes) return audio_hat
only generate silent audio
@Etherll Unsure if you know anything about this
Can you share your setup? like notebook link
I tried the official notebook and it worked fine. I set model_name = 'canopylabs/3b-es_it-ft-research_release and trained for one epoch using the ylacombe/google-argentinian-spanish dataset it already sounds pretty good to me
Here's the audio after fine-tuning for reference: https://vocaroo.com/15Kfj81345kI
I think the problem is due to your redistribute_codes function changes
Sorry for the late reply. this is the notebook i used, https://colab.research.google.com/drive/1sINXJCjZFPQDtUD9nYBdS3vRkKndLgsJ?usp=sharing The data i used can be downloaded from this link: https://drive.google.com/file/d/1rpMGSQLcdas9oM6xQ31rsvtp_KU7BxtF/view?usp=sharing I'm new to speech fine-tuning,would greatly appreciate it if you could help me.
Can you share your setup? like notebook link I tried the official notebook and it worked fine. I set
model_name = 'canopylabs/3b-es_it-ft-research_releaseand trained for one epoch using theylacombe/google-argentinian-spanishdataset it already sounds pretty good to meHere's the audio after fine-tuning for reference: https://vocaroo.com/15Kfj81345kI
I think the problem is due to your redistribute_codes function changes
@Etherll
I'm facing a strange problem. I trained the model on a custom dataset from a female speaker, but the output voice is male. My dataset was created by splitting a female speaker's audio into 10s clips + ASR. Trained for 20 steps. The training process completes without errors. I tested with the official female dataset, and it works correctly, producing a female voice. This suggests the issue is likely with my custom dataset. Could you provide any insights on what might be going wrong? Thanks. @Etherll , you can find my notebook https://colab.research.google.com/drive/1V0b2wza_SGx9hs26g-JNov_cufS5HIq6?usp=sharing