API choice feedback

Open muellerzr opened this issue 6 months ago • 1 comments

Just some notes as I tried it out today @Blaizzy :)

Overall awesome 🚀

Main painpoint I hit however is when I ran:

generate_audio(
    text=("Watch, as he sits there and questions why he is even here. What is his purpose? Why does he matter."),
    model_path=some_model,
    file_prefix="output",
    audio_format="wav",
    sample_rate=24000,
    join_audio=True,
    verbose=True  # Set to False to disable print messages
)

without specifying a voice, it tried to download a default and not a valid voice available to the model itself leaving to a confusing huggingface_hub error. IMO these should be avoided at all costs (hf hub errors) and instead more relevant sub errors being given. E.g. I'd expect to not see that at all and get an error stating "voice xyz was not found for the model locally or on the Hub. Verify this voice exists"

Second issue (small):

I'm used to an API where someone puts in a filename such as voice.wav instead of voice="voice". And as a result it gave an error because we appended .wav to it. Would be nice to just flag if we do this or not and if not don't append (let users have as much freedom as they can get for specifying files)

Aug 24 '25 20:08 muellerzr

Hey Zach

Thanks for the awesome feedback!

Indeed we need more detailed errors for voice cloning and default voices. Will work on it.

Regarding the second point. For voice cloning which uses .wav extension you usually pass ref_audio and optionally ref_text. The voice argument is set to a name because kokoro and a few other models use torch tensors instead of .wav. But you are right, I will unify them into a single argument.

Sep 01 '25 17:09 Blaizzy