mlx-audio icon indicating copy to clipboard operation
mlx-audio copied to clipboard

Abnormal Audio Generation in mlx-community/Dia-1.6B - [S1] and [S2] Voices Not Working Properly

Open zhaopengme opened this issue 9 months ago • 1 comments

I'm encountering issues with audio generation using the mlx-community/Dia-1.6B model. Here's a detailed description of the problem:

Reproduction Steps:

  1. Use the following configuration:
generate_audio(
    text=("[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face.",
    model_path="mlx-community/Dia-1.6B",
    file_prefix="audiobook_chapter1",
    join_audio=True,
    verbose=True
)

Expected Behavior:

  • Properly generated audio with distinct [S1] and [S2] voices
  • Natural speech synthesis with appropriate emphasis

Actual Behavior:

  • Abnormal audio output (see attached file)
  • Voices seem to overlap or produce distorted sounds
  • Text-to-speech conversion appears inconsistent

Additional Information:

  1. Audio file (fixed format): audio.mp3.txt

github doesn't allow mp3 uploads, change audio.mp3.txt to audio.mp3, thanks!

  1. Environment: [Please specify your OS, Python version, and library versions if applicable]
  2. Error logs: [Include any console output if available] This issue prevents proper use of the model for dialogue generation. Could you please investigate?

Thank you!

zhaopengme avatar May 17 '25 10:05 zhaopengme

Thanks @zhaopengme!

Are you using the main branch?

Because we recently refactored the sample_rate to default to each models recommended rate. Read more here in PR #148

Blaizzy avatar May 17 '25 11:05 Blaizzy