Bug running Speech to Speech on Server: AttributeError when voice parameter is passed as float

Open JohnYangSam opened this issue 5 months ago • 1 comments

Reproduction

Run mlx_audio.server With the running server navigate to the GUI Try "Speech to Speech" Start Stream Talk to the stream Get Error: undefined in the GUI and the below stack trace.

Further Description Upon Inspection

The TTS server crashes with an AttributeError: 'float' object has no attribute 'split' when a voice parameter is passed as a float value instead of a string during speech-to-speech processing.

Expected Behavior

The system should either:

Accept float values for voice parameters and handle the type conversion internally, or
Provide a clear validation error message indicating that voice parameters must be strings

Actual Behavior

The server crashes with an AttributeError when attempting to call .split() on a float value.

Steps to Reproduce

Start the mlx_audio server
Send a speech-to-speech request where the voice parameter is passed as a float value
The server crashes with the traceback below

Error Traceback

INFO: 127.0.0.1:52932 - "POST /speech_to_speech_input HTTP/1.1" 200 OK Traceback (most recent call last): File "/.../site-packages/fastrtc/reply_on_pause.py", line 400, in emit output = next(self.generator) # type: ignore ^^^^^^^^^^^^^^^^^^^^ File "/.../site-packages/mlx_audio/server.py", line 74, in speech_to_speech_handler for segment in tts_model.generate( ^^^^^^^^^^^^^^^^^^^ File "/.../site-packages/mlx_audio/tts/models/kokoro/kokoro.py", line 288, in generate for segment_idx, (graphenes, phonemes, audio) in enumerate( ^^^^^^^^^^ File "/.../site-packages/mlx_audio/tts/models/kokoro/pipeline.py", line 369, in call pack = self.load_voice(voice) if self.model else None ^^^^^^^^^^^^^^^^^^^^^^ File "/.../site-packages/mlx_audio/tts/models/kokoro/pipeline.py", line 157, in load_voice packs = [self.load_single_voice(v) for v in voice.split(delimiter)] ^^^^^^^^^^^ AttributeError: 'float' object has no attribute 'split'

Root Cause

The issue occurs in pipeline.py line 157 where the code assumes the voice parameter is a string and calls .split(delimiter) on it. When a float is passed instead, this fails.

Environment

Python 3.12
mlx_audio (latest version)
macOS

Suggested Fix

Add type validation in the load_voice method to either:

Convert float/numeric values to strings before processing
Raise a clear ValueError with instructions on expected parameter types

def load_voice(self, voice): if not isinstance(voice, str): if isinstance(voice, (int, float)): voice = str(voice) else: raise ValueError("Voice parameter must be a string") # ... rest of the method

Aug 31 '25 17:08 JohnYangSam

Sorry for that.

The speech to speech demo is due for a remake.

You can use our pipecat integration for the time being:

https://x.com/kwindla/status/1960447000132116562

Sep 01 '25 18:09 Blaizzy