Abnormal Audio Generation in mlx-community/Dia-1.6B - [S1] and [S2] Voices Not Working Properly

Open zhaopengme opened this issue 9 months ago • 1 comments

I'm encountering issues with audio generation using the mlx-community/Dia-1.6B model. Here's a detailed description of the problem:

Reproduction Steps:

Use the following configuration:

generate_audio(
    text=("[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face.",
    model_path="mlx-community/Dia-1.6B",
    file_prefix="audiobook_chapter1",
    join_audio=True,
    verbose=True
)

Expected Behavior:

Properly generated audio with distinct [S1] and [S2] voices
Natural speech synthesis with appropriate emphasis

Actual Behavior:

Abnormal audio output (see attached file)
Voices seem to overlap or produce distorted sounds
Text-to-speech conversion appears inconsistent

Additional Information:

Audio file (fixed format): audio.mp3.txt

github doesn't allow mp3 uploads, change audio.mp3.txt to audio.mp3, thanks!

Environment: [Please specify your OS, Python version, and library versions if applicable]
Error logs: [Include any console output if available] This issue prevents proper use of the model for dialogue generation. Could you please investigate?

Thank you!

May 17 '25 10:05 zhaopengme

Thanks @zhaopengme!

Are you using the main branch?

Because we recently refactored the sample_rate to default to each models recommended rate. Read more here in PR #148

May 17 '25 11:05 Blaizzy