Bug running Speech to Speech on Server: AttributeError when voice parameter is passed as float
Bug running Speech to Speech on Server: AttributeError when voice parameter is passed as float
Reproduction
Run mlx_audio.server
With the running server navigate to the GUI
Try "Speech to Speech"
Start Stream
Talk to the stream
Get Error: undefined in the GUI and the below stack trace.
Further Description Upon Inspection
The TTS server crashes with an AttributeError: 'float' object has no attribute 'split' when a voice parameter is passed as a float value instead of a string during speech-to-speech processing.
Expected Behavior
The system should either:
- Accept float values for voice parameters and handle the type conversion internally, or
- Provide a clear validation error message indicating that voice parameters must be strings
Actual Behavior
The server crashes with an AttributeError when attempting to call .split() on a float value.
Steps to Reproduce
- Start the mlx_audio server
- Send a speech-to-speech request where the voice parameter is passed as a float value
- The server crashes with the traceback below
Error Traceback
INFO: 127.0.0.1:52932 - "POST /speech_to_speech_input HTTP/1.1" 200 OK Traceback (most recent call last): File "/.../site-packages/fastrtc/reply_on_pause.py", line 400, in emit output = next(self.generator) # type: ignore ^^^^^^^^^^^^^^^^^^^^ File "/.../site-packages/mlx_audio/server.py", line 74, in speech_to_speech_handler for segment in tts_model.generate( ^^^^^^^^^^^^^^^^^^^ File "/.../site-packages/mlx_audio/tts/models/kokoro/kokoro.py", line 288, in generate for segment_idx, (graphenes, phonemes, audio) in enumerate( ^^^^^^^^^^ File "/.../site-packages/mlx_audio/tts/models/kokoro/pipeline.py", line 369, in call pack = self.load_voice(voice) if self.model else None ^^^^^^^^^^^^^^^^^^^^^^ File "/.../site-packages/mlx_audio/tts/models/kokoro/pipeline.py", line 157, in load_voice packs = [self.load_single_voice(v) for v in voice.split(delimiter)] ^^^^^^^^^^^ AttributeError: 'float' object has no attribute 'split'
Root Cause
The issue occurs in pipeline.py line 157 where the code assumes the voice parameter is a string and calls .split(delimiter) on it. When a float is passed instead, this fails.
Environment
- Python 3.12
- mlx_audio (latest version)
- macOS
Suggested Fix
Add type validation in the load_voice method to either:
- Convert float/numeric values to strings before processing
- Raise a clear ValueError with instructions on expected parameter types
def load_voice(self, voice): if not isinstance(voice, str): if isinstance(voice, (int, float)): voice = str(voice) else: raise ValueError("Voice parameter must be a string") # ... rest of the method
Sorry for that.
The speech to speech demo is due for a remake.
You can use our pipecat integration for the time being:
https://x.com/kwindla/status/1960447000132116562