mlx-audio
mlx-audio copied to clipboard
Abnormal Audio Generation in mlx-community/Dia-1.6B - [S1] and [S2] Voices Not Working Properly
I'm encountering issues with audio generation using the mlx-community/Dia-1.6B model. Here's a detailed description of the problem:
Reproduction Steps:
- Use the following configuration:
generate_audio(
text=("[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face.",
model_path="mlx-community/Dia-1.6B",
file_prefix="audiobook_chapter1",
join_audio=True,
verbose=True
)
Expected Behavior:
- Properly generated audio with distinct [S1] and [S2] voices
- Natural speech synthesis with appropriate emphasis
Actual Behavior:
- Abnormal audio output (see attached file)
- Voices seem to overlap or produce distorted sounds
- Text-to-speech conversion appears inconsistent
Additional Information:
- Audio file (fixed format): audio.mp3.txt
github doesn't allow mp3 uploads, change audio.mp3.txt to audio.mp3, thanks!
- Environment: [Please specify your OS, Python version, and library versions if applicable]
- Error logs: [Include any console output if available] This issue prevents proper use of the model for dialogue generation. Could you please investigate?
Thank you!
Thanks @zhaopengme!
Are you using the main branch?
Because we recently refactored the sample_rate to default to each models recommended rate. Read more here in PR #148